GPT-5.4: Long Context Reasoning Meets Computer Control

GPT‑5.4 is OpenAI’s new flagship GPT‑5‑series model that combines stronger reasoning, coding, and agent-style “computer use” into one general‑purpose model, with a huge context window (up to around 1M tokens via API) and significantly lower hallucination/error rates than GPT‑5.2.

Core capabilities

Reasoning and knowledge work: GPT‑5.4 is tuned for complex “real work” tasks involving documents, spreadsheets, and presentations, and scores about 83% on OpenAI’s internal GDPval benchmark for professional tasks.
Coding and agents: It incorporates the advanced coding abilities of GPT‑5.3‑Codex and is designed to drive tools and multi‑step workflows, not just produce text.

Computer-use and tools

Native computer control: GPT‑5.4 can write code to operate computers (e.g., through Playwright) and issue mouse/keyboard commands based on screenshots, enabling far more capable agents.
Tool & web use: It’s better at browsing and at orchestrating tools/APIs over multiple steps, including improved persistence across multi‑round tool calls and web interactions.
New “Tool Search” in the API: Instead of stuffing all tool definitions into the prompt, the model can look up only the tools it needs, reducing token overhead and latency in tool‑heavy systems.

Scale, performance, and reliability

Context window: API variants support up to about a 1M‑token context window, more than double GPT‑5.2’s 400k, and on par with Google and Anthropic’s long‑context models.
Efficiency: It solves similar tasks using fewer tokens than GPT‑5.2, making long/complex workflows cheaper at a given rate.
Factuality: OpenAI reports 33% fewer incorrect individual claims and 18% fewer responses with any factual error versus GPT‑5.2, calling it their “most factual model yet.”

GPT‑5.4 Thinking

Reasoning variant: “GPT‑5.4 Thinking” is a dedicated reasoning model used in ᑕᕼᗩTGᑭT, optimized for harder multi‑step problems and long‑horizon tasks.
Interactive thinking: In ᑕᕼᗩTGᑭT you can interrupt its long reasoning process mid‑answer and steer or redirect it before it finalizes a response.
Extreme reasoning mode: For API use, reports indicate a mode that allocates more compute/time to difficult questions, particularly valuable for multi‑hour agent runs and complex research or coding tasks.

Availability and positioning

Surfaces: GPT‑5.4 is rolling out via the OpenAI API, Codex, and across ρáíd ᑕᕼᗩTGᑭT tiers, with the Thinking variant specifically powering reasoning in ᑕᕼᗩTGᑭT.
Role in the lineup: It sits above GPT‑5.3 Instant (latency‑optimized) as the “frontier” workhorse model for professional workflows and autonomous/agentic use cases.

Overall, GPT‑5.4 is OpenAI’s push to turn language models into reliable “junior colleagues” that can read huge contexts, reason more transparently, and actually operate computers end‑to‑end, rather than just chat. It’s specifically aimed at serious, high‑stakes work—coding, research, operations, and agents—so if you’re already comfortable orchestrating tools and workflows, it’s the first GPT‑5‑series model that feels built for production‑grade autonomous systems rather than just better autocomplete.

Learn more about this update on the Official Blog Post.

Your feedback is highly appreciated

Support my other posts

Click to expand...

Search

Search