Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot

Microsoft has launched Critique, a brand new multi-model deep analysis system inside Researcher, the deep analysis agent in Microsoft 365 Copilot, as a part of a broader push to make Copilot really feel extra reliable for severe information work as a substitute of simply quick drafting.
According to Microsoft, Critique is designed for complicated analysis duties and works by splitting the job into two components: one mannequin handles planning, retrieval, synthesis, and drafting, whereas a second mannequin evaluations and refines the output earlier than the ultimate report is produced. Microsoft says the system makes use of fashions from frontier labs together with OpenAI and Anthropic, and that it’s out there now by way of the corporate’s Frontier program.
Reuters reported that in Critique’s present setup, OpenAI’s GPT generates the response and Anthropic’s Claude evaluations it for accuracy and high quality earlier than the reply reaches the person. Microsoft has additionally mentioned it desires this workflow to turn out to be bi-directional afterward, permitting fashions to evaluate one another in each instructions.
What Critique truly does inside Microsoft 365 Copilot
Microsoft’s personal description makes it clear that Critique isn’t just a beauty characteristic or a brand new button slapped onto Copilot.It works inside Researcher in Microsoft 365 Copilot and is constructed for deeper duties the place getting it proper issues simply as a lot as getting it finished quick. One mannequin does the digging and drafts the report, whereas the second steps in like an editor, checking the information, sharpening the construction, and serving to flip it right into a extra dependable ultimate piece.
Microsoft says the entire thought is to separate technology from analysis, moderately than asking one mannequin to brainstorm, write, fact-check, and polish its personal work suddenly. That distinction issues as a result of a number of AI failure comes from precisely that one-model bottleneck. When a single system is requested to do every part, it will possibly produce one thing that appears polished whereas quietly lacking gaps, overreaching on claims, or leaning on weak proof.
Microsoft says Critique’s evaluate layer is constructed round rubric-based analysis, with consideration to supply reliability, report completeness, and strict proof grounding. In plain English, the second mannequin is there to ask whether or not the draft truly answered the query, whether or not the sourcing is stable, and whether or not the ultimate narrative is supported as a substitute of merely sounding assured.
Microsoft shouldn’t be pitching Critique as a facet experiment
One of the extra essential particulars in Microsoft’s announcement is that Critique would be the default expertise in Researcher when Auto is chosen within the mannequin picker. That alerts the corporate sees this as greater than an elective lab characteristic for energy customers. It is successfully treating multi-model evaluate as the brand new baseline for deep analysis high quality inside Microsoft 365 Copilot. That is a significant product selection, as a result of it suggests Microsoft believes enterprise prospects care much less about uncooked response pace than they do about fewer hallucinations, stronger construction, and extra confidence within the completed report.
That additionally matches neatly into Microsoft’s broader messaging round Wave 3 of Microsoft 365 Copilot, the place the corporate has been pushing the concept of Copilot as a “system for work” constructed on a multi-model benefit moderately than on any single AI lab. In Microsoft’s framing, Copilot is supposed to drag the most effective out there intelligence from throughout the business, grounded in work context by way of what it calls Work IQ and guarded by enterprise knowledge controls. Critique is likely one of the clearest examples but of that technique shifting from advertising and marketing language into a visual product characteristic.
The benchmark numbers are a giant a part of Microsoft’s gross sales pitch
Microsoft shouldn’t be solely saying Critique feels higher. It is saying the system carried out higher on a proper benchmark. In its technical write-up, the corporate says it examined Critique on the DRACO benchmark, quick for Deep Research Accuracy, Completeness, and Objectivity, which covers 100 complicated analysis duties throughout 10 domains. Microsoft says responses had been judged throughout factual accuracy, breadth and depth of research, presentation high quality, and quotation high quality, and that Critique outperformed the single-model model of Researcher throughout all 4 measures.
The firm highlighted the most important positive factors in breadth and depth of research, adopted by presentation high quality and factual accuracy. It additionally says the enhancements had been statistically important and that Researcher with Critique delivered a +7.0 level aggregated rating enchancment, or +13.88% over Perplexity Deep Research (Claude Opus 4.6 mannequin), which Microsoft described as the most effective system reported within the benchmark paper.
Data | Source: Microsoft
That is an attention grabbing declare, particularly as a result of the deep analysis race has turn out to be one of the vital aggressive fronts in enterprise AI. Research instruments are now not being judged solely by whether or not they can collect data, however by whether or not they can assemble a report that feels decision-ready.
Microsoft’s argument is that the evaluate layer forces researchers to determine lacking angles, tighten group, problem weak claims, and use citations extra fastidiously. Whether prospects expertise these positive factors in actual workflows will matter greater than benchmark charts, however Microsoft is clearly making an attempt to sign that it is a measurable high quality soar moderately than a imprecise mannequin replace.
Council reveals Microsoft is pondering past one “greatest reply”
Critique shouldn’t be the one characteristic Microsoft launched alongside this replace. The firm additionally launched Council, a multi-model comparability mode inside Researcher. Microsoft says Council runs Anthropic and OpenAI fashions concurrently, permitting every to generate a full standalone report. A separate decide mannequin then creates a distilled abstract displaying the place the stories agree, the place they diverge, and what every uniquely contributes. Microsoft Support describes this as Model Council, a mode that preserves each full stories and provides a comparability abstract to assist customers determine which output is stronger or tips on how to mix them.
That is a really attention-grabbing sign about the place enterprise AI could also be heading. For some time, the business behaved as if the purpose was to seek out one mannequin that might substitute all of the others. Microsoft’s newest transfer suggests the extra practical future could also be one the place corporations don’t belief any single mannequin sufficient to make it the one voice within the room.
The timing of Critique shouldn’t be unintentional. Microsoft has been below stress to point out that Microsoft 365 Copilot is changing into extra helpful, extra differentiated, and extra invaluable as competitors intensifies.
Reuters tied the rollout of Critique and Council to Microsoft’s effort to enhance Copilot adoption in a market the place rivals together with Google’s Gemini and Anthropic’s Claude merchandise are pushing arduous into office AI. Axios additionally famous that Microsoft’s multi-model technique has one other profit: it reveals the corporate shouldn’t be locked into overdependence on OpenAI at a time when frontier mannequin management can shift rapidly.
The submit Microsoft Introduced Critique, A New Multi-Model Deep Research System In M365 Copilot appeared first on Metaverse Post.
