Microsoft rolled out Critique, a two-model reviewer flow, and Council, a side-by-side multi-model mode inside M365 Copilot research workflows. Critique was reported at 57.4 on Draco and about 7 points above earlier Researcher versions.

Microsoft's new release is really two different multi-model patterns inside the same research workflow. In the Critique announcement, one model produces the research output, while a second model acts as reviewer. The reviewer is described as checking factual grounding, strengthening structure, and improving citation quality rather than generating from scratch. A separate product clip shows Critique surfaced directly in the Copilot interface as a research feature rather than a backend-only change.
Council takes the parallel path instead. The Council demo shows the same prompt executed across multiple models at once, with outputs displayed side by side in the Researcher experience. That makes it a comparison mode for prompt-level variance, whereas Critique is a staged handoff between generator and reviewer. Together, the launch shifts M365 Copilot from single-model answering toward orchestrated multi-model research flows.
The clearest performance claim is Microsoft's reported 57.4 score on Draco. In the benchmark post, that number is attached specifically to Critique, and the rollout summary says it is "+7.0" versus previous Researcher versions. The public detail here is limited, but the claimed gain is tied to the reviewer step: better factual accuracy, broader analysis, stronger presentation, and better citations.
The demo evidence also clarifies how Microsoft is differentiating the two modes in practice. The video teaser for Critique ends on "Draco Benchmark: 57.4," reinforcing it as the quality-focused path, while Council interface shows Council splitting one Q2-summary prompt into multiple concurrent outputs. For engineers, the implementation signal is less about a new base model and more about orchestration: M365 Copilot is exposing multi-model routing patterns as product features.
Microsoft has just released a VERY powerful feature Copilot Critique allows one model from Anthropic/OpenAI to generate the research output and another one reviews it. โ Model 1 plans, retrieves sources, synthesizes, and drafts โ Model 2 evaluates claims, strengthensย Show more
Introducing Critique, a new multi-model deep research system in M365 Copilot. You can use multiple models together to generate optimal responses and reports.
Microsoft launched "Council" for M365 Copilot, a new multi-model mode in which several models execute the same prompt simultaneously. Things are getting more and more interesting there. Coplexity ๐
Microsoft announced Critique, a multi-model Deep Research solution for M365 Copilot, which achieved a score of 57.4 on the Draco benchmark. The future is multi-model ๐