Google Starts Limited Rollout of “Gemini 2.5 Ultra” via Deep Think — AI Ultra Subscribers Get Early Access as Frontier Safety Evaluations Continue
Category: Tool Dynamics
Excerpt:
Google has begun a limited rollout grey test of its most compute-intensive Gemini 2.5 capability via Deep Think inside the Gemini app—an “enhanced reasoning” mode designed for highly complex math, coding, and multi-step problem solving. Google positions Deep Think as a variant built on Gemini 2.5 Pro’s foundation with extended parallel thinking time, and it’s being rolled out to Google AI Ultra subscribers with daily usage limits. Google also says a full “IMO-grade” version of Deep Think is being shared with a small group of mathematicians and academics, while broader access expands gradually as safety and usability data comes in.
Google Gemini 2.5 “Ultra” Enters Limited Rollout: Deep Think Grey Test Begins for AI Ultra Subscribers
United States (global rollout pending) — Google has started a limited rollout (灰度测试) of its highest-end Gemini 2.5 reasoning capability through Deep Think inside the Gemini app. Google describes Deep Think as an enhanced reasoning mode that uses parallel thinking techniques and extended inference time to solve harder problems than standard Gemini 2.5 Pro—while being gated behind the premium Google AI Ultra subscription and daily usage limits.
📌 Key Highlights at a Glance
- What’s rolling out: Deep Think in the Gemini app (enhanced reasoning mode)
- How it’s being described: Often referred to as “Gemini 2.5 Ultra” capability by users/press; Google positions it as Deep Think on top of Gemini 2.5 Pro’s foundation
- Who gets access: Google AI Ultra subscribers (U.S. launch first; expansion expected)
- Why grey test: High compute cost + safety and misuse risk increases as reasoning depth grows
- Hard limit: Google states there is a fixed set of Deep Think prompts per day (exact number may vary)
- Tools: Deep Think in the app can work with tools like Google Search and code execution
- Research track: An “IMO-grade” version is being shared with a small set of mathematicians/academics
- Safety posture: Google highlights frontier safety evaluations and planned mitigations for critical capability levels
🤖 What Is Deep Think (and Why People Call It “Gemini 2.5 Ultra”)?
Deep Think is Google’s enhanced reasoning mode that extends Gemini’s “thinking time” using parallel thinking techniques. The model generates multiple candidate approaches, evaluates them simultaneously, and refines a final answer—similar to how humans explore different strategies before committing.
Because Deep Think is the most compute-intensive Gemini 2.5 experience and is gated to the top subscription tier, many users shorthand it as “Gemini 2.5 Ultra.” In Google’s own product messaging, it appears as a selectable mode/tool within the Gemini app experience for Ultra subscribers rather than a public, widely exposed standalone API model ID.
⏱️ Why This Is a Grey Test: Cost, Safety, and Usability
Google explicitly frames Deep Think as resource-heavy: it may take longer per response and is limited to a fixed number of daily prompts. That makes “limited rollout” a pragmatic control knob for product stability and cost management.
Google also notes that as Gemini’s problem-solving ability increases, it is taking a deeper look at frontier risks and applying mitigations—suggesting that the rollout is also gated by safety evaluation readiness, not just infrastructure capacity.
🔧 How to Try It (If You’re in the Rollout)
- Subscribe to Google AI Ultra (available in the U.S. at launch, per Google).
- Open the Gemini app/web and select Gemini 2.5 Pro in the model dropdown.
- Toggle Deep Think in the prompt bar (it appears as a mode/tool rather than the default model for all prompts).
- Use it for complex tasks: multi-file coding, advanced math, algorithmic tradeoffs, and step-by-step design iteration.
🎯 Where Deep Think Shines (Best-Fit Tasks)
Advanced Math & Proof-Style Reasoning
Google ties Deep Think to strong performance on challenging math-style benchmarks and discusses an “IMO-grade” variant for academic testing.
Hard Coding Tasks
Deep Think is positioned for tough coding problems where formulation, tradeoffs, and careful reasoning matter more than speed.
Iterative Design & Systems Thinking
Google highlights iterative building tasks where the model improves solutions step-by-step, including web development and design refinement.
Scientific & Research Synthesis
The rollout is paired with a narrative about helping researchers explore hypotheses and reason through complex literature.
🏁 Competitive Context: “Reasoning Depth” Becomes a Product Tier
Google’s tiering strategy is increasingly clear: keep the most compute-heavy reasoning experiences behind the highest subscription tier, similar to how other labs gate frontier reasoning models. The goal is to balance (1) user demand for top capability with (2) infrastructure cost and (3) safety evaluation cycles.
❓ Frequently Asked Questions
Is “Gemini 2.5 Ultra” an official public model name?
Google’s most explicit official naming in public posts centers on “Deep Think” as a mode available to AI Ultra subscribers, built on Gemini 2.5 Pro’s foundation. Many users and some outlets may refer to this experience as “Gemini 2.5 Ultra,” but the official product surface is Deep Think in the Gemini app.
Why is access limited (grey test)?
Google notes Deep Think uses more compute and is subject to daily usage limits, and it also references frontier safety evaluations and mitigations—both of which naturally support staged rollout.
Will Deep Think come to the API?
Google said it is working to release Deep Think (with and without tools) to a set of trusted testers via the Gemini API in the weeks following the initial rollout.
The Bottom Line
Gemini 2.5’s “Ultra-level” experience is now effectively in grey test via Deep Think: gated, capped, and expanding gradually as Google gathers safety and usability signals. For power users, it’s an early look at how Google will productize frontier reasoning—treating “thinking depth” as a premium, tightly-controlled capability rather than a default model setting for everyone.
Stay tuned to our Tool Dynamics section for continued coverage.










