Viral BridgeBench Post Claims Claude Opus 4.6 Was ‘Nerfed,’ Critics Call It Bad Science

April 13, 2026

BridgeMind AI claimed Anthropic’s Claude Opus 4.6 was secretly degraded after a hallucination benchmark retest. The viral publish has since drawn sharp criticism for flawed methodology.

The declare triggered widespread debate over whether or not AI corporations are quietly downgrading paid fashions to scale back prices.

BridgeMind Claims a 98% Surge in Hallucinations

BridgeMind, the crew behind the BridgeBench coding benchmark, posted that Claude Opus 4.6 had fallen from second to tenth place on its hallucination leaderboard. Accuracy reportedly dropped from 83.3% to 68.3%.

“CLAUDE OPUS 4.6 IS NERFED. BridgeBench simply proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of solely 68.3%,” they wrote.

The publish framed this as proof of “decreased reasoning ranges.” However, a better take a look at the underlying knowledge tells a unique story.

Critics Say the Comparison Is Fundamentally Flawed

According to laptop scientist Paul Calcraft, the declare is “extremely unhealthy science,” highlighting a crucial downside with the methodology.

“Incredibly unhealthy science You examined Opus on 30 duties right this moment, earlier rating was on simply *6* duties Results for six duties in widespread: 85.4% rating right this moment vs. 87.6% prevly. Swing is generally from a *single* fabrication with out repeats – simply statistical noise,” commented Calcraft.

The authentic high rating got here from simply six benchmark duties. The new retest expanded the benchmark to 30 duties.

On the six overlapping duties, efficiency was almost similar, dropping solely from 87.6% to 85.4%.

Despicable clout chasing. They examined Opus right this moment on 30 duties, earlier Opus 4.6 rating was on simply *6* duties. DIFFERENT BENCHMARK

6 duties in widespread outcomes: 85.4% rating right this moment vs. 87.6% prev. Swing is generally from a *single* fabrication with out repeats – simply statistical noise https://t.co/wmFfAfNmEW pic.twitter.com/opUxoVevpP

— Paul Calcraft (@paul_cal) April 12, 2026

That small swing got here largely from a single additional fabrication in a single process. With no repeated runs, this falls properly inside regular statistical variance for AI fashions.

Large language fashions should not deterministic, and one unhealthy output on a small pattern can shift outcomes considerably.

Broader Frustrations Fuel the Narrative

Still, the publish struck a nerve. Since its February 2026 launch, Claude Opus 4.6 has faced persistent complaints about perceived high quality decline.

Developers report shorter responses, weaker instruction-following, and decreased reasoning depth throughout peak hours.

Some of this traces to deliberate product modifications. Anthropic introduced adaptive pondering controls that permit the mannequin self-adjust its reasoning funds. The default effort stage was later set to medium, prioritizing effectivity over most depth.

New on the API: we’re giving builders higher management over mannequin effort and extra flexibility for long-running brokers.

Adaptive pondering lets Claude calibrate its reasoning depth to every process, and context compaction retains long-running duties from hitting limits.

— Claude (@claudeai) February 5, 2026

An unbiased evaluation of over 6,800 Claude Code periods discovered reasoning depth dropped roughly 67% by late February.

The mannequin’s file-read ratio earlier than enhancing code fell from 6.6 to 2.0. That suggests it tried fixes on code it had barely reviewed.

What This Means for AI Users

This displays a rising pressure within the AI business. Companies optimize fashions for price and scale after launch, whereas heavy customers anticipate constant peak performance. The hole between these priorities erodes belief.

Based on the out there proof, the BridgeBench knowledge doesn’t show a deliberate downgrade. The benchmark comparability was apples-to-oranges, and the overlapping outcomes had been almost similar.

However, the underlying frustration will not be fully baseless. Adaptive compute controls and service-level optimizations have modified how Claude Opus 4.6 behaves in observe. For builders counting on constant output, these modifications matter.

Anthropic has not issued a public assertion on the precise BridgeBench claims as of April 13.

The publish Viral BridgeBench Post Claims Claude Opus 4.6 Was ‘Nerfed,’ Critics Call It Bad Science appeared first on BeInCrypto.

AI Companies Technology

Pope Leo Just Called Out the AI Giants Bigger Than Most Governments
ByRicardo May 26, 2026

Pope Leo XIV has launched his first encyclical calling for binding worldwide regulation of synthetic intelligence, together with a direct prohibition on machines making deadly or irreversible selections. Anthropic co-founder Christopher Olah appeared at the Vatican as a lay presenter, putting a outstanding AI security researcher alongside the Catholic Church at the middle of the…

Read More Pope Leo Just Called Out the AI Giants Bigger Than Most Governments
Featured Technology

Top User‑Friendly Smart Contract Security Tools Without Coding
ByRicardo January 2, 2026

If you’re not writing Solidity, the belief goes, auditing isn’t your drawback. In apply, the other is true. Non-technical customers work together with good contracts always — staking, swapping, bridging, farming, minting NFTs — and so they’re usually those absorbing the losses when one thing goes fallacious. The problem is that conventional audits aren’t designed…

Read More Top User‑Friendly Smart Contract Security Tools Without Coding
People Technology

Solana co-founder urges need for Bitcoin to adopt quantum resistance for future security
ByRicardo September 19, 2025September 19, 2025

Solana co-founder Anatoly Yakovenko is urging the Bitcoin group to start transitioning to quantum-resistant security measures, warning that advances in quantum computing might arrive sooner than anticipated. Speaking throughout a Sept. 18 session on the All-In Summit, stated the accelerating tempo of technological breakthroughs means Bitcoin shouldn’t wait till the menace is imminent. According to…

Read More Solana co-founder urges need for Bitcoin to adopt quantum resistance for future security
Blockchain News Technology

Google Veteran Says Quantum Computing Can’t Crack Bitcoin
ByRicardo August 12, 2025

Google veteran and blockchain CEO Graham Cooke says quantum computing will never crack Bitcoin’s security. He made this claim despite recent advances in quantum chips from tech giants. In a post on X, Cooke said Bitcoin’s cryptographic security is unbreakable. He dismissed concerns about Microsoft, Google, and IBM’s new quantum chips. The Numbers Don’t Add…

Read More Google Veteran Says Quantum Computing Can’t Crack Bitcoin
Blockchain News Technology

What the AWS Outage Revealed — and Why Projects Like Fluence Are Rebuilding Cloud Infrastructure for Web3
ByRicardo October 30, 2025

A latest outage at Amazon Web Services (AWS) froze 1000’s of purposes and reignited debate about web3’s reliance on centralized cloud suppliers. The disruption uncovered how deeply crypto platforms nonetheless rely on Web2 infrastructure for methods meant to function with out interruption. How a Centralized Outage Crippled Decentralized Systems On October 20, 2025, the AWS…

Read More What the AWS Outage Revealed — and Why Projects Like Fluence Are Rebuilding Cloud Infrastructure for Web3
Analysis Technology

Solana applications generate $2.4 billion, proving the network is finally decoupling from this volatile metric
ByRicardo January 7, 2026

Solana’s ecosystem recorded its strongest monetary yr to this point in 2025, posting all-time highs in income, lively customers, and buying and selling quantity at the same time as the network’s native token completed the yr practically 50% beneath its early peak. According to CryptoSlate knowledge, SOL rallied to greater than $250 in the first…

Read More Solana applications generate $2.4 billion, proving the network is finally decoupling from this volatile metric

Viral BridgeBench Post Claims Claude Opus 4.6 Was ‘Nerfed,’ Critics Call It Bad Science

BridgeMind Claims a 98% Surge in Hallucinations

Critics Say the Comparison Is Fundamentally Flawed

Broader Frustrations Fuel the Narrative

What This Means for AI Users

Pope Leo Just Called Out the AI Giants Bigger Than Most Governments

Top User‑Friendly Smart Contract Security Tools Without Coding

Solana co-founder urges need for Bitcoin to adopt quantum resistance for future security

Google Veteran Says Quantum Computing Can’t Crack Bitcoin

What the AWS Outage Revealed — and Why Projects Like Fluence Are Rebuilding Cloud Infrastructure for Web3

Solana applications generate $2.4 billion, proving the network is finally decoupling from this volatile metric

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!

BridgeMind Claims a 98% Surge in Hallucinations

Critics Say the Comparison Is Fundamentally Flawed

Broader Frustrations Fuel the Narrative

What This Means for AI Users

Similar Posts

Curated by experts. Filtered for relevance.

Resources

About

Subscribe & learn more every day!