What Gemini 3 Flash Is
Gemini 3 Flash sits within the Gemini 3 family as the high-throughput counterpart to Gemini 3 Pro. While Gemini 3 Pro focuses on maximum reasoning depth and absolute peak performance, Gemini 3 Flash focuses on scale, speed, and cost efficiency while retaining strong reasoning and multimodal ability.
Rather than being a downgraded model, Flash is designed for production environments where response time, throughput, and unit economics matter. Google describes this positioning as "frontier intelligence that scales with you," meaning the model is intended to handle serious reasoning tasks without the latency and expense usually associated with flagship models.
The fact that Gemini 3 Flash is already replacing earlier models inside Google products suggests internal confidence in its quality, robustness, and safety.
Native Multimodal
Core Capabilities
Gemini 3 Flash is natively multimodal—it does not treat vision or audio as add-ons. The model directly understands and reasons across multiple input types.
Text & Documents
Long documents, complex queries, and multi-turn conversations
Code Intelligence
Analysis, debugging, generation, and code review
Image Understanding
Spatial reasoning, UI screenshots, diagrams, and charts
Video Frames
Extract insights and reason about video content
Audio Input
Process and understand audio inputs natively
Adaptive Thinking
Dynamically adjusts reasoning depth based on complexity
This makes Gemini 3 Flash suitable for coding assistants, AI agents, visual question answering systems, document processing pipelines, and applications that combine screenshots, logs, diagrams, and instructions.
Adaptive Thinking
One of the most important changes in Gemini 3 Flash is its adaptive thinking system. The model dynamically adjusts how much internal reasoning it performs:
- For easy queries, it responds quickly with minimal computation
- For complex queries, it automatically increases internal reasoning effort
- This happens without any configuration from the developer
On average, Gemini 3 Flash consumes roughly 30% fewer tokens than Gemini 2.5 Pro on reasoning-heavy workloads. Even when per-token pricing is similar, the total cost of completing a task is often lower.
API Pricing
Pricing, Speed, and Efficiency
Text Input
$0.50
per 1M tokens
Text Output
$3.00
per 1M tokens
Audio Input
$1.00
per 1M tokens
- • Time to first token: under 1 second
- • Output speed: ~218 tokens/second
- • 3x faster than Gemini 2.5 Pro
- • Context caching for repeated prompts
- • Batch processing APIs for discounts
- • Ideal for high-volume systems
Comprehensive Benchmark Comparison
Gemini 3 Flash consistently lands at the best price-to-performance point among tested models. It outperforms Gemini 2.5 Pro across many reasoning, coding, and multimodal benchmarks while being both faster and cheaper.
| Benchmark | Description | Gemini 3 Flash | Gemini 3 Pro | Gemini 2.5 Flash | Gemini 2.5 Pro | Claude Sonnet 4.5 | GPT-5.2 | Grok 4.1 Fast |
|---|---|---|---|---|---|---|---|---|
| Input price | $/1M tokens | $0.50 | $2.00 | $0.30 | $1.25 | $3.00 | $1.75 | $0.20 |
| Output price | $/1M tokens | $3.00 | $12.00 | $2.50 | $10.00 | $15.00 | $14.00 | $0.50 |
| Humanity's Last Exam | Academic reasoningNo tools | 33.7% | 37.5% | 11.0% | 21.6% | 13.7% | 34.5% | 17.6% |
| ARC-AGI-2 | Visual reasoning puzzles | 33.6% | 31.1% | 2.5% | 4.9% | 13.6% | 52.9% | — |
| GPQA Diamond | Scientific knowledge | 90.4% | 91.9% | 82.8% | 86.4% | 83.4% | 92.4% | 84.3% |
| AIME 2025 | MathematicsNo tools | 95.2% | 95.0% | 72.0% | 88.0% | 87.0% | 100% | 91.9% |
| MMMU-Pro | Multimodal understanding | 81.2% | 81.0% | 66.7% | 68.0% | 68.0% | 79.5% | 63.0% |
| ScreenSpot-Pro | Screen understanding | 69.1% | 72.7% | 3.9% | 11.4% | 36.2% | 86.3% | — |
| CharXiv Reasoning | Chart synthesis | 80.3% | 81.4% | 63.7% | 69.6% | 68.5% | 82.1% | — |
| Video-MMMU | Knowledge from videos | 86.9% | 87.6% | 79.2% | 83.6% | 77.8% | 85.9% | — |
| LiveCodeBench Pro | Competitive codingElo rating | 2316 | 2439 | 1143 | 1775 | 1418 | 2393 | — |
| Terminal-bench 2.0 | Agentic terminal coding | 47.6% | 54.2% | 16.9% | 32.6% | 42.8% | — | — |
| SWE-bench Verified | Agentic coding | 78.0% | 76.2% | 60.4% | 59.6% | 77.2% | 80.0% | 50.6% |
| τ2-bench | Agentic tool use | 90.2% | 90.7% | 79.5% | 77.8% | 87.2% | — | — |
| Toolathlon | Long horizon real-world | 49.4% | 36.4% | 3.7% | 10.5% | 38.9% | 46.3% | — |
| MCP Atlas | Multi-step MCP workflows | 57.4% | 54.1% | 3.4% | 8.8% | 43.8% | 60.6% | — |
| FACTS Benchmark | Factuality & grounding | 61.9% | 70.5% | 50.4% | 63.4% | 48.9% | 61.4% | 42.1% |
| SimpleQA Verified | Parametric knowledge | 68.7% | 72.1% | 28.1% | 54.5% | 29.3% | 38.0% | 19.5% |
| MMMLU | Multilingual Q&A | 91.8% | 91.8% | 86.6% | 89.5% | 89.1% | 89.6% | 86.8% |
| Global PIQA | Commonsense reasoning | 92.8% | 93.4% | 90.2% | 91.5% | 90.1% | 91.2% | 85.6% |
Source: DeepMind evaluation methodology. For details see deepmind.google/models/evals-methodology/gemini-3-flash
Model Comparison
Gemini 3 Flash vs Earlier Gemini Models
vs Gemini 2.5 Pro
Gemini 3 Flash outperforms Gemini 2.5 Pro across many reasoning, coding, and multimodal benchmarks while being both faster and cheaper in practice. It delivers higher throughput, lower latency, and improved long-context and visual reasoning.
vs Gemini 2.5 Flash
Gemini 2.5 Flash was primarily designed as a lightweight low-cost model. Gemini 3 Flash closes much of the gap with Pro-class models—offering deeper reasoning, stronger visual understanding, and more consistent performance on complex tasks.
Key Advantages
Enhanced Visual Reasoning
Compared to Gemini 2.5 generation
Visual and spatial reasoning is noticeably stronger in Gemini 3 Flash. Tasks such as counting objects, understanding layouts, or interpreting UI screenshots show consistent improvements. The ScreenSpot-Pro benchmark shows a jump from 3.9% (2.5 Flash) to 69.1% (3 Flash)—a massive improvement in screen understanding capabilities.
Experience Gemini 3 Flash
Frontier intelligence that scales with you. Near-Pro-level reasoning at Flash-level speed and cost.