|

GPT-5.4 Pro jumps to 150 IQ on MESNA Norway test as OpenAI breaks its own record

OpenAI’s o3 scores 136 on Mensa Norway test, surpassing 98% of human population

OpenAI’s newest GPT-5.4 Pro mannequin has now achieved an IQ rating larger than 99.96% of all human beings, giving markets a contemporary sign that AI functionality positive factors are beginning to outpace the standard product-cycle noise.

OpenAI’s GPT-5.4 Pro touches 150 on public IQ benchmark as markets enter one other macro-heavy week

TrackingAI’s public leaderboard now locations OpenAI GPT-5.4 Pro at an IQ rating of 150, a pointy step up from the 136 rating that OpenAI’s o3 posted on the Mensa Norway test final 12 months.

The bounce arrives at a second when market consideration has narrowed round Iran, power, labor softness, and the following inflation print. That creates a unique query for the week forward: how shortly is machine intelligence compounding, and when will that acceleration start to overlap with financial positioning?

Why this issues: A transfer from 136 to 150 on a extensively understood benchmark compresses a fancy functionality shift right into a easy sign. For companies, that sign feeds straight into choices round automation, software program budgets, and headcount planning. For markets, it provides one other variable alongside charges, inflation, and development expectations.

OpenAI introduced GPT-5.4 as its most succesful and environment friendly frontier mannequin for skilled work, with stronger coding, software use, and laptop use, and a context window of up to 1 million tokens. In the identical launch, OpenAI stated GPT-5.4 achieved a brand new state-of-the-art on GDPval and exceeded human efficiency on OSWorld-Verified.

Those benchmarks are separate from a public IQ test, but the course of journey aligns. Capability is rising throughout separate measurement methods, and that rise is changing into quick sufficient to affect budgeting, hiring plans, workflow design, and software program spend.

A rating of 150 on a public IQ-style benchmark compresses a broader functionality transfer right into a single, moveable sign. The quantity is simple to perceive even earlier than the methodology is debated.

The earlier o3 Mensa outcome established the benchmark and its limits. GPT-4.1’s one-million-token context window confirmed how OpenAI was extending mannequin utility throughout long-horizon code and doc duties, whereas our evaluation of OpenAI’s expanding capital loop linked mannequin progress to {hardware} growth, financing loops, and infrastructure demand.

Taken collectively, these developments place the newest IQ rating inside a broader industrial and financial context. A transfer from 136 to 150 on a public benchmark is hanging on its own. A transfer from 136 to 150 whereas OpenAI is pushing deeper into software use, laptop use, enterprise productiveness, and capital-intensive infrastructure carries broader implications.

Public IQ benchmarks are restricted, however the functionality curve continues to be shifting larger

Public IQ-style assessments stay imperfect devices for measuring frontier fashions. TrackingAI runs a public Mensa-style benchmark and in addition maintains a tougher non-public offline test.

IQ-style assessments compress a slim slice of cognitive efficiency right into a single quantity, obscuring variation throughout reasoning varieties, context dealing with, creativity, and real-world problem-solving.

For AI and people alike, scores are delicate to test design, coaching publicity, and sample familiarity, which makes them a loud proxy for common functionality.

An IQ of 150 sits on the excessive higher tail of the distribution, typically related to people such as Albert Einstein or Richard Feynman. In sensible phrases, it implies very quick abstraction, sturdy sample recognition, and the power to navigate complicated, multi-step issues with restricted steering.

The platform studies scores as rolling averages throughout current completions, and the methodology raises acquainted questions round immediate construction, reproducibility, training-set contamination, and format familiarity. Those considerations had been already seen when o3 reached 136, and so they stay lively now that GPT-5.4 Pro sits at 150.

OpenAI’s o3 scores 136 on Mensa Norway test, surpassing 98% of human population
Related Reading

OpenAI’s o3 scores 136 on Mensa Norway test, surpassing 98% of human population

OpenAI’s o3 model reaches Mensa-Level IQ in independent testing.

Apr 17, 2025
·
Liam ‘Akiba’ Wright

Even with these limits, the broader sample has grow to be tougher to dismiss. One remoted benchmark outcome will be defined away as a quirk. A cluster of positive factors throughout public IQ-style testing, coding, browser use, desktop navigation, and knowledge-work efficiency carries extra analytical weight.

TrackingAI’s latest leaderboard locations GPT-5.4 Pro on the prime of its public IQ board forward of all Cluade, Gemini, Qwen, and Grok fashions, providing an exterior, legible public benchmark that maps shortly onto the broader functionality debate.

Few individuals want an in depth understanding of benchmark design to grasp that 150 sits in a uncommon vary and buyers don’t want to settle for each premise behind an IQ-style test to acknowledge {that a} bounce of this measurement suggests acceleration reasonably than drift.

Chart titled “AI IQ Test Results” showing average Mensa Norway IQ scores for major AI models on a bell curve, with OpenAI’s GPT-5.4 variants plotted near the top end of the range.
Chart titled “AI IQ Test Results” exhibiting common Mensa Norway IQ scores for main AI fashions on a bell curve, with OpenAI’s GPT-5.4 variants plotted close to the highest finish of the vary.

Enterprise patrons additionally don’t want to consider that IQ equals common intelligence to see that methods with stronger sample recognition, stronger software use, and stronger long-horizon job dealing with are shifting towards economically helpful territory, extending far past puzzle-solving.

This factors towards methods that may search, plan, confirm, navigate, and produce actual work throughout prolonged contexts. In that setting, the IQ rating features much less as a novelty quantity and extra as a sign of the density of frontier reasoning.

There can be aggressive worth within the leaderboard itself. A management place on a public benchmark reinforces OpenAI’s standing within the race for seen functionality management, particularly at a second when mannequin differentiation is changing into tougher to discern from structure notes alone.

Benchmark management compresses complexity right into a easy hierarchy. It presents builders a sign, enterprise patrons a story deal with, and buyers one other proxy for the place the potential frontier presently sits.

OpenAI’s benchmark climb is starting to overlap with the financial week forward

The week forward nonetheless runs by means of macro. The Bureau of Labor Statistics calendar clearly lays out the following key releases: the FOMC minutes from the March 17 to 18 assembly, due on April 8; the March Consumer Price Index, due on April 10; and the March Producer Price Index, due on April 14.

That schedule retains charges, inflation, and development nervousness within the foreground, however beneath that floor, a second financial monitor is taking form, and OpenAI sits close to its heart.

Capability development in frontier AI more and more intersects with capital allocation. A mannequin that pushes larger on public reasoning assessments whereas additionally bettering in coding, search, and laptop use adjustments how companies take into consideration workflow redesign. It adjustments what software program patrons count on from copilots and brokers. It adjustments how shortly enterprises transfer from experimentation towards deployment.

Jack Dorsey not too long ago posted that Block is shifting “from hierarchy to intelligence,” utilizing AI to take over coordination work as soon as dealt with by administration layers as the corporate reorganizes round particular person contributors, straight accountable people, and player-coaches

Capability development additionally adjustments which duties will be carved out of labor price buildings and reassigned to software program. These results transfer by means of narrower channels first, together with doc workflows, spreadsheet workflows, buyer assist, analysis duties, browser automation, inside operations, code era, and verification loops.

OpenAI’s industrial course reinforces that interpretation. In its GPT-5.4 launch materials, the corporate described stronger efficiency in skilled work, stronger software search, native laptop use, and positive factors in benchmarked information work throughout occupations that map straight onto the U.S. financial system.

That locations AI functionality development inside a well-recognized market query, the place spending flows subsequent if these methods proceed bettering at this tempo.

The reply extends past mannequin subscription income into cloud demand, chips, information facilities, networking, energy, software program licenses, and labor productiveness assumptions. OpenAI’s increasing capital loop already displays a part of that construction, and the benchmark acquire provides a less complicated public-facing sign on prime of it.

That overlap is what provides the newest outcome broader relevance throughout a macro-heavy week. Markets already know the CPI setup. Markets already know oil costs can feed into inflation expectations. Markets already know the Fed minutes might be parsed for coverage tone.

But is the expansion in intelligence itself starting to behave like a macro variable? Faster functionality positive factors can alter enterprise spending plans, tighten aggressive strain throughout white-collar features, assist larger infrastructure outlays, and strengthen the case for AI-linked capital expenditure even in a slower nominal development atmosphere.

When TrackingAI exhibits GPT-5.4 Pro at 150, the quantity falls inside a market that already views OpenAI as greater than a lab. It is a platform firm, a deployment firm, an infrastructure buyer, and a sign generator for adjoining sectors.

The subsequent test sits in two locations without delay. One is methodological; public IQ-style benchmarks will preserve drawing scrutiny, and they need to. The different is financial; markets will resolve, step-by-step, whether or not functionality jumps of this measurement deserve to be priced alongside labor information, charge expectations, and capital spending developments.

OpenAI’s newest benchmark climb pushes that call nearer. The rating is compact, legible, and straightforward to flow into. Its deeper relevance comes from the identical place as the corporate’s broader product push; the frontier continues to be climbing, and the financial footprint of that climb is changing into tougher to preserve in a separate class.

The submit GPT-5.4 Pro jumps to 150 IQ on MESNA Norway test as OpenAI breaks its own record appeared first on CryptoSlate.

Similar Posts