Gemini 3 Deep Think

blog.google - 524 poäng - 315 kommentarer - 22038 sekunder sedan

https://x.com/GoogleDeepMind/status/2021981510400709092

https://x.com/fchollet/status/2021983310541729894

Kommentarer (33)

lukebechtel - 21423 sekunder sedan
Arc-AGI-2: 84.6% (vs 68.8% for Opus 4.6)
Wow.
https://blog.google/innovation-and-ai/models-and-research/ge...
logicprog - 14290 sekunder sedan
Is it me or is the rate of model release is accelerating to an absurd degree? Today we have Gemini 3 Deep Think and GPT 5.3 Codex Spark. Yesterday we had GLM5 and MiniMax M2.5. Five days before that we had Opus 4.6 and GPT 5.3. Then maybe two weeks I think before that we had Kimi K2.5.
xnx - 19904 sekunder sedan
Google is absolutely running away with it. The greatest trick they ever pulled was letting people think they were behind.
rob-wagner - 3274 sekunder sedan
I’ve been using Gemini 3 Pro on a historical document archiving project for an old club. One of the guys had been working on scanning old handwritten minutes books written in German that were challenging to read (1885 through 1974). Anyways, I was getting decent results on a first pass with 50 page chunks but ended up doing 1 page at a time (accuracy probably 95%). For each page, I submit the page for a transcription pass followed by a translation of the returned transcription. About 2370 pages and sitting at about $50 in Gemini API billing. The output will need manual review, but the time savings is impressive.
sigmar - 21450 sekunder sedan
Here is the methodologies for all the benchmarks: https://storage.googleapis.com/deepmind-media/gemini/gemini_...
The arc-agi-2 score (84.6%) is from the semi-private eval set. If gemini-3-deepthink gets above 85% on the private eval set, it will be considered "solved"
>Submit a solution which scores 85% on the ARC-AGI-2 private evaluation set and win $700K. https://arcprize.org/guide#overview
sega_sai - 885 sekunder sedan
I do like google models (and I pay for them), but the lack of competitive agent is a major flaw in Google's offering. It is simply not good enough in comparison to claude code. I wish they put some effort there (as I don't want to pay two subscriptions to both google and anthropic)
simianwords - 21142 sekunder sedan
OT but my intuition says that there’s a spectrum
- non thinking models
- thinking models
- best of N models like deep think an gpt pro
Each one is of a certain computational complexity. Simplifying a bit, I think they map to - linear, quadratic and n^3 respectively.
I think there are certain class of problems that can’t be solved without thinking because it necessarily involves writing in a scratchpad. And same for best of N which involves exploring.
Two open questions
1) what’s the higher level here, is there a 4th option?
2) can a sufficiently large non thinking model perform the same as a smaller thinking?
Scene_Cast2 - 13562 sekunder sedan
It's a shame that it's not on OpenRouter. I hate platform lock-in, but the top-tier "deep think" models have been increasingly requiring the use of their own platform.
Decabytes - 7599 sekunder sedan
Gemini has always felt like someone who was book smart to me. It knows a lot of things. But if you ask it do anything that is offscript it completely falls apart
jetter - 13332 sekunder sedan
it is interesting that the video demo is generating .stl model. I run a lot of tests of LLMs generating OpenSCAD code (as I have recently launched https://modelrift.com text-to-CAD AI editor) and Gemini 3 family LLMs are actually giving the best price-to-performance ratio now. But they are very, VERY far from being able to spit out a complex OpenSCAD model in one shot. So, I had to implement a full fledged "screenshot-vibe-coding" workflow where you draw arrows on 3d model snapshot to explain to LLM what is wrong with the geometry. Without human in the loop, all top tier LLMs hallucinate at debugging 3d geometry in agentic mode - and fail spectacularly.
anematode - 2672 sekunder sedan
It found a small but nice little optimization in Stockfish: https://github.com/official-stockfish/Stockfish/pull/6613
Previous models including Claude Opus 4.6 have generally produced a lot of noise/things that the compiler already reliably optimizes out.
Metacelsus - 21470 sekunder sedan
According to benchmarks in the announcement, healthily ahead of Claude 4.6. I guess they didn't test ChatGPT 5.3 though.
Google has definitely been pulling ahead in AI over the last few months. I've been using Gemini and finding it's better than the other models (especially for biology where it doesn't refuse to answer harmless questions).
aliljet - 10633 sekunder sedan
The problem here is that it looks like this is released with almost no real access. How are people using this without submitting to a $250/mo subscription?
siva7 - 15768 sekunder sedan
I can't shake of the feeling that Googles Deep Think Models are not really different models but just the old ones being run with higher number of parallel subagents, something you can do by yourself with their base model and opencode.
sinuhe69 - 17465 sekunder sedan
I'm pretty certain that DeepMind (and all other labs) will try their frontier (and even private) models on First Proof [1].
And I wonder how Gemini Deep Think will fare. My guess is that it will get half the way on some problems. But we will have to take an absence as a failure, because nobody wants to publish a negative result, even though it's so important for scientific research.
[1] https://1stproof.org/
simonw - 17435 sekunder sedan
The pelican riding a bicycle is excellent. I think it's the best I've seen.
https://simonwillison.net/2026/Feb/12/gemini-3-deep-think/
neilellis - 19158 sekunder sedan
Less than a year to destroy Arc-AGI-2 - wow.
- 13565 sekunder sedan
Legend2440 - 6544 sekunder sedan
I'm really interested in the 3D STL-from-photo process they demo in the video.
Not interested enough to pay $250 to try it out though.
ramshanker - 18000 sekunder sedan
Do we get any model architecture details like parameter size etc.? Few months back, we used to talk more on this, now it's mostly about model capabilities.
vessenes - 18917 sekunder sedan
Not trained for agentic workflows yet unfortunately - this looks like it will be fantastic when they have an agent friendly one. Super exciting.
Dirak - 10313 sekunder sedan
Praying this isn't another Llama4 situation where the benchmark numbers are cooked. 84.6% on Arc-AGI is incredible!
- 16613 sekunder sedan
- 8024 sekunder sedan
jonathanstrange - 20503 sekunder sedan
Unfortunately, it's only available in the Ultra subscription if it's available at all.
ismailmaj - 15948 sekunder sedan
top 10 elo in codeforces is pretty absurd
andrewstuart - 14806 sekunder sedan
Gemini was awesome and now it’s garbage.
It’s impossible for it to do anything but cut code down, drop features, lose stuff and give you less than the code you put in.
It’s puzzling because it spent months at the head of the pack now I don’t use it at all because why do I want any of those things when I’m doing development.
I’m a paid subscriber but there’s no point any more I’ll spend the money on Claude 4.6 instead.
m3kw9 - 14921 sekunder sedan
Gemini 3 Pro/Flash is stuck in preview for months now. Google is slow but they progress like a massive rock giant.
okokwhatever - 16334 sekunder sedan
I need to test the sketch creation a s a p. I need this in my life because learning to use Freecad is too difficult for a busy person like me (and frankly, also quite lazy)
syntaxing - 20760 sekunder sedan
Why a Twitter post and not the official Google blog post… https://blog.google/innovation-and-ai/models-and-research/ge...
bschmidt720 - 12291 sekunder sedan
[dead]
HardCodedBias - 12297 sekunder sedan
Always the same with Google.
Gemini has been way behind from the start.
They use the firehose of money from search to make it as close to free as possible so that they have some adoption numbers.
They use the firehose from search to pay for tons of researchers to hand hold academics so that their non-economic models and non-economic test-time-compute can solve isolated problems.
It's all so tiresome.
Try making models that are actually competitive, Google.
Sell them on the actual market and win on actual work product in millions of people lives.
dperhar - 15020 sekunder sedan
Does anyone actually use Gemini 3 now? I cant stand its sleek salesy way of introduction, and it doesnt hold to instructions hard – makes it unapplicable for MECE breakdowns or for writing.