Gemini 3.5 Flash
- easygenes - 4391 sekunder sedanFor those who would like to know the total and active parameter count of this model: even though Google doesn't disclose the model technicals, we can infer them within relatively tight margins based on what we do know.
We know they serve the model on TPU 8i, which we have plenty of hard specs for (so we know the key constraints: total memory and bandwidth and compute flops). We can also set a ceiling on the compute complexity and memory demand of the model based on knowing they will be at least as efficient as what is disclosed in the Deepseek V4 Technical Report.
We can also assume that the model was explicitly built to run efficiently in a RadixAttention style batched serving scenario on a single TPU 8i (so no tensor parallelism, etc. to avoid unnecessary overheads... Google explicitly designed the 8th-generation inference architecture to eliminate the need for tensor sharding on mid-sized models).
We know Google intends to serve this model at a floor speed of around 280 tok/s too.
Putting all these pieces together, we can confidently say this model is ~250-300B total, and 10-16B active parameters. Likely mostly FP4 with FP8 where it matters most.
Visual:
I do model serving optimization work. This is napkin math.ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β TPU 8i VRAM (288 GB) β βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ€ β Static Model Weights β Dynamic Allocations & β β (250B - 300B @ Mixed β Compressed KV Caches β β FP4/FP8) β (RadixAttention / SRAM) β β ~110 GB - 150 GB β ~138 GB - 178 GB β βββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββ - simonw - 28515 sekunder sedanThe pelican is a lot: https://github.com/simonw/llm-gemini/issues/133#issuecomment...
Not a great bicycle though, it forgot the bar between the pedals and the back wheel and weirdly tangled the other bars.
Expensive too - that pelican cost 13 cents: https://www.llm-prices.com/#it=11&ot=14403&sel=gemini-3.5-fl...
- GodelNumbering - 30422 sekunder sedanPer million input/output tokens:
Gemini 2.5 flash: $0.30/$2.50
Gemini 3.0 flash preview: $0.50/$3.00
Gemini 3.5 flash: $1.50/$9.00
Interesting pricing direction. I don't think we have ever seen a 3x price increase for in the immediate next same-sized model (and lol @ 3 only ever getting a preview).
3.5 flash costs similar to Gemini 2.5 pro which was $1.25/$10
- SXX - 32772 sekunder sedan
3.5 Flash: Thinking Medium - 7516 tokens> Create animated SVG of a frog on a boat rowing through jungle river. Single page self contained HTML page with SVGhttps://gistpreview.github.io/?5c9858fd2057e678b55d563d9bff0...
3.5 Flash: Thinking High - 7280 tokens
https://gistpreview.github.io/?1cab3d70064349d08cf5952cdc165...
3.1 Pro - 28,258 tokens
https://gistpreview.github.io/?6bf3da2f80487608b9525bce53018...
Though 3.1 took 3 minutes of thinking to generate, but it only one that got animated movement.
- OhMeadhbh - 26897 sekunder sedanAm I really so old that when someone says "Flash" my immediate response is... "consider HTML5 instead" ??
- nl - 4725 sekunder sedanOn my Agentic SQL benchmark it scores 19/25. That's... mediocre.
It means performs worse than 3.1 Flash Lite Preview (22/25), is slower (367s vs 142s) and is more expensive (75c vs 2c).
It is outperformed by Gemma4 26B-A4B in every way(!)
https://sql-benchmark.nicklothian.com/?highlight=google_gemi...
(Switch to the cost vs performance chart to see how far this is off the Pareto frontier)
- gertlabs - 3152 sekunder sedanTaking into account that this is a flash model, it's a strong release. It's very fast and frontier-ish for the price.
Raw intelligence is high for a flash model. But Google's problem has always been productization and tool use, whereas raw intelligence is always competitive. It does not look like they solved that with this release -- in fact, their tool use delta (the improvement in scores when given arbitrary tools and a harness) has actually regressed from some previous models.
Data at https://gertlabs.com/rankings
- hmate9 - 23735 sekunder sedanI have google ai pro plan and tried antigravity with 3.5 flash but it used up all my quota in two prompts. If that is not a bug then it is seriously unusable.
- lanewinfield - 27414 sekunder sedanGemini 3.5 Flash's 2000 token clocks aren't bad. https://clocks.brianmoore.com/
- reconnecting - 29245 sekunder sedanKnowledge cutoff: January 2025
Latest update: May 2026
I have a very bad feeling about this lag.
- margorczynski - 18677 sekunder sedanWow at the price hike. Still I think in the long run the Chinese will win if they're able to produce hardware comparable to Nvidia.
- npn - 30443 sekunder sedanThe price is crazy.
And I guess Gemini 3.5 pro will have the pricing increment, too. 12 x 5 = 60?
It seems like google does want us to use Chinese models.
- wg0 - 26741 sekunder sedan3x price increase for a similar model almost. And they said AI would be cheaper and ubiquitous.
- OsrsNeedsf2P - 31572 sekunder sedanBeats 3.1 Pro for price per token, but artificial analysis is showing it's dumber per token and costs more overall
- asar - 33627 sekunder sedan$1.5/m input tokens $9/m output tokens
6x the price of 3.1 flash lite
- brikym - 21590 sekunder sedanHow is this progress? The token cost just keeps going up and up. Flash is the new Pro? Do the models actually cost more to run or is it fattening margins?
- s3p - 31311 sekunder sedanYikes. I think the concept of a 'flash' model is changing, no? Google used to market this as its lower-intelligence, faster, cheaper option. I appreciate that they are delivering on both of those, but personally I would appreciate if they could create an incremental knowledge improvement while holding price steady. Fortune 500 companies have to make their money I guess.
- nikhilpareek13 - 22224 sekunder sedanworth noting that Google marked this stable rather than preview, which is unusual compared to their recent releases. Pair that with the 3x price hike and flash pricing now reads like long-term floor they want, not a temporary thing they will walk back later. But its hard to tell yet whether that's Google specifically reading the room or the whole industry quietly resetting the cheap-inference baseline.
- himata4113 - 33605 sekunder sedanEngineers at google have publically stated that the models are too big and are far from their potencial. Glad they're being proven right with every release.
They continue to focus on smaller models while openai and anthropic are increasing compute requirements for their SOTA models.
- stared - 20856 sekunder sedanChina: we donβt need to use US models, we can distill them ourself
Google: we donβt need Chinese to distill our models, we can do it ourself
- razodactyl - 7392 sekunder sedanAw. The listen to article widget doesn't work properly on mobile Safari and when using the options button, the popup appears below the "In this article" dropdown occluding it.
At least it read the authors of the article to me.
I wish we would push more towards testing code. Agentic AI excel when it's engaged.
- golfer - 33515 sekunder sedanHere's the benchmark scoreboard they published:
https://storage.googleapis.com/gweb-uniblog-publish-prod/ori...
- sigbeta - 3739 sekunder sedanI am interested to see how they will serve demand with they TPU monopoly have.
- Alifatisk - 23635 sekunder sedanThe demo of the model in Antigravity automatically rename and categorize unstructured assets using vision was quite cool, it demodulates that the IDE sidepanel can be used for more than just coding. I wonder if the harness in Antigravity is based on Gemini cli or if they are completely different. Could Gemini cli do the same task? Or is the vision feature a Antigravity thing?
- sbinnee - 18706 sekunder sedanWhile I am excited, the price compared to gemini 3 flash preview which I used for the longest time is x3 more. Upon arrival of deepseek v4 flash, I am a happy user of deepseek. We will see how long that reign would last after I try this new gemini.
- paol_taja - 12410 sekunder sedanThat pelican looks like it just sold a SaaS company and bought a bike because its therapist said it needed balance.
- golfer - 30017 sekunder sedanArena.ai:
> Gemini 3.5 Flashβs pricing shifts the Pareto frontier in Text. 8 models from GoogleDeepMind dominate the Text Arena Pareto curve where only 4 labs are represented for top performance in their price tiers.
- merb - 30928 sekunder sedanStil no new processor version for document ai https://docs.cloud.google.com/document-ai/docs/release-notes that is so weird. (Customer extractor)
Itβs not possible to uptrain on preview releases and it did not get that much love for a while.
- mchusma - 6943 sekunder sedanI have thought about this and I think overall, this was a disappointing release from Google. I'm not sure the sentiment, but this feels like a miss.
What they did do in the keynote was spend a lot of time talking about their distribution advantage, and how they can own the consumer in search. But not a lot that will benefit partners or developers.
Basically, they released something broadly competitive with Sonnet 4.6, a new Omni model that seems interesting but unclear yet. They have completely ceded the frontier to OpenAI / Anthropic, and are saying "look for pro next month".
The best release since nano banana pro from Google has been Gemma.
- jonnyasmar - 14147 sekunder sedanThe $1.50/$9.00 pricing is a meaningful shift if you've been running Gemini as the "fast iteration" half of a multi-model coding workflow. I've had Claude Code, Codex, and Gemini CLI running side by side and the working split was "Gemini for quick scaffolding and exploration where the cost of being wrong is low, Sonnet for correctness-critical stuff." At 3x the Flash pricing that split stops making sense β you're paying Sonnet-tier output rates for not-quite-Sonnet quality.
For pure chat that's annoying but tolerable. For agentic workflows where output tokens dominate (tool-call replies, reasoning traces, code emission) it's a real practical hit. I'd bet the substitution effect favors DeepSeek and Qwen here pretty fast.
- aliljet - 32653 sekunder sedanIs there a good benchmark tracking hallucinations? The models are all incredibly good now, even the open ones, and my hope is that the rate of hallucinations is something that's falling off in concert with larger and larger context lengths.
- eis - 32315 sekunder sedan3.5 Flash was more expensive than 3.1 Pro to run the Artifical Analysis test suite. $1551 for 3.5 Flash [0] vs $892 for 3.1 Pro [1]. That's 74% more cost while ranking lower. It's 2.5x as fast but I don't think the bang for the buck is there anymore like it was with 3.0 Flash. I'm a bit bummed out to be honest.
I did not expect such a huge (3x) price increase from 3.0 Flash and I bet many people will not just blindly upgrade as the value proposition is widely different.
One interesting point to note is that Google marked the model as Stable in contrast to nearly everything else being perpetually set as Preview.
[0] https://artificialanalysis.ai/models/gemini-3-5-flash [1] https://artificialanalysis.ai/models/gemini-3-1-pro-preview
- ErystelaThevale - 11392 sekunder sedanGemini has been too agreeable to be useful for actual debate. Curious if 3.5 changes that, or just the benchmarks
- bredren - 25774 sekunder sedanCan anyone who has extensive, recent, experience with Claude code and Codex contextualize the current Gemini CLI product experience?
- pqdbr - 19521 sekunder sedanIn my tests, in real production use cases, it's a hard pass.
It's actually 10-15% slower and also more expensive than Gemini 3.1 Pro, because it thinks more than 2.5x Gemini 3.1 Pro.
So that thinking verbosity nullifies the speed and cost gains.
AND the quality is worse than 3.1 Pro for our use cases, making mistakes Pro doesn't make.
- mixtureoftakes - 33199 sekunder sedanbenchmarks look REALLY good, the price hike is big but it also beats sonnet 4.6 in every discipline?
- paperwork360 - 26414 sekunder sedanGoogle also updated Antigravity. version 2.0 is more for conversation with agent. The previous VS Code like IDE was much better.
- - 33568 sekunder sedan
- MASNeo - 28567 sekunder sedanWell, available for Gemini means these days that half the time they are βReceiving a lot of requests right now.β and so sorry they couldnβt complete the task. Luckily the model supports long time horizons because thatβs whatβs needed. /me likes Gemini a lot just wishing Google would add the compute!
- x3cca - 26679 sekunder sedanI'm excited for the conversation to switch from intelligence to tps instead. I care much less about what hard thought experiments models can one shot and much more how responsive my plain text interface for doing things is.
- - 26405 sekunder sedan
- mackross - 28106 sekunder sedanThe antigravity teamwork-preview doesn't work for me -- upgraded to ultra, installed antigravity 2, ran teamwork-preview, keeps failing: "You have exhausted your capacity on this model. Your quota will reset after 0s."
- noelsusman - 31101 sekunder sedanThe Artificial Analysis benchmark results are pretty underwhelming. Roughly the same "intelligence" as MiMo-V2.5-Pro for over 3x the cost. We'll have to see how that translates to actual usage but it's not a great sign.
- amelius - 22695 sekunder sedanGemini, please block all ads in my search engine.
- swe_dima - 33721 sekunder sedanFlash family but costs like a Pro. $9 vs $12 for output.
- alexdns - 33944 sekunder sedanIts Gemini 3.5 Flash
- victor9000 - 17687 sekunder sedanThere was a brief moment in time where Gemini was the greatest thing since sliced bread, then it got nerfed from outer space without a version bump or any meaningful mention from Google, no thanks.
- - 33052 sekunder sedan
- uean - 15244 sekunder sedanI have to admit that 3.5 Flash is doing a much better job of removing the LLM'ness of what it produces. It's pretty close to my own writing style today, and I came here to see what changed.
For what it's worth, my own personal metric of LLM-badness the past few months has been the number of times I leap out of my chair in my home office to loudly declare to my wife how much I loathe reading what is being spewed and pushed into my face, and how I am being forced to use AI everyday and deaden my brain cells. Today is like a breath of fresh air.
- kristopolous - 23788 sekunder sedanRelatively speaking here's where it's at:
this is from artificial-analysis using https://github.com/day50-dev/aa-eval-email/blob/main/art-ana...score age size name 44.2 97 large GLM-5 (Reasoning) 44.7 187 - GPT-5.1 (high) 44.9 29 - Qwen3.6 Max Preview 45 0 - Gemini 3.5 Flash 45.5 27 large MiMo-V2.5-Pro 45.6 75 - GPT-5.4 (low)which you can invoke with
$ curl day50.dev/art-analysis.sh | bash
inspect the code. it's tiny.
I use it all the time and maintain it. Snag a copy and pull it down again if it breaks on you. I stay on top of it.
- dsabanin - 4590 sekunder sedannow matter what google does for some reason the agentic performance of their models is missing something, i hope this release is stronger. we need more competition.
- - 33630 sekunder sedan
- owentbrown - 25091 sekunder sedanHas anyone switched from Claude 4.7 Opus or ChatGPT 5.5 to this? How does it feel? Dumber? Worth it for the speed? I'd love someone's subjective take on it, after doing a long session of coding.
Reiner Pope gave a talk on Dwarkesh Patel about token economics. I guess faster is a lot more expensive, generally.
Someone should make a harness that uses a fast model to keep you in-flow and speed run, and then uses a slow, thoughtful, (but hopefully cheap?) model to async check the work of the faster model. Maybe even talk directly to the faster model?
Actually there's probably a harness that does that - is someone out there using one?
- f311a - 34913 sekunder sedan$9/1M output
- ai_fry_ur_brain - 26506 sekunder sedanImagine reducing yourself to the worst of averages by making your competency 1:1 correlated to the tokens that you have access too (and everyone else does).
- andrewstuart - 29922 sekunder sedanThe benchmark that matters - can it actually program as well as Claude code.
If not then Iβm not using it.
Cancelled my account 3 months ago, only Claude code level capability would bring me back.
- hubraumhugo - 30646 sekunder sedanJust updated my HN Wrapped project with it and it does well on my totally unscientific LLM humor benchmark: https://hn-wrapped.kadoa.com
- bakugo - 32563 sekunder sedanTriple the price of the last Flash model ($3 -> $9 per 1M output). Quickly approaching Sonnet prices.
Feels like the AI pricing noose is tightening sooner rather than later.
- nightski - 32111 sekunder sedanAI being a product is not the future. It's more like an operating system that deserves to be open and free (aka Linux). Unless that happens we are in for a very dystopian future. I wish I had the intelligence, resources and/or connections to try and make that happen.
- uejfiweun - 22989 sekunder sedanThis is funny, I was randomly using Gemini today and I was astounded how good the responses I was getting were from Flash. I guess this must be the reason why.
- danny094 - 17496 sekunder sedanso google is just trying to be cool in 2026 huh
- stan_kirdey - 28828 sekunder sedanEXPENSIVE ._.
- casey2 - 27104 sekunder sedanI think the field moved to agents too fast. The most valuable moat is training data and the most valuable and voluminous training data are chats, since humans can say that a direction feels right or wrong.
- simianwords - 30673 sekunder sedanNo one talking about how this flash Beats Pro? Imagine what 3.5 pro looks like?
Also concerned about Gemini models being benchmaxxed generally
- - 35246 sekunder sedan
- danny094 - 17443 sekunder sedanCodex is way better pricing than this lol
- lern_too_spel - 14654 sekunder sedanThey also announced Antigravity CLI, which uses Gemini 3.5 by default. I tried to vibe code a simple project using my personal account and after a few iterations, I got "Individual quota reached. Contact your administrator to enable overages. Resets in [7 days]." Really? 7 days? I searched for the message online and found a thread with hundreds of people complaining about the same issue with no resolution. Classic Google.
- cesarvarela - 32746 sekunder sedanAdd Flash to the title, please.
- llmslave - 30020 sekunder sedanConspiracy theory:
This model isnt an advancement, its a previous model that runs more compute, which is why it costs more
- ralusek - 28405 sekunder sedanThose prices, what a disappointment.
- hmaddipatla - 11185 sekunder sedan[dead]
- choam2426 - 2985 sekunder sedan[dead]
- benbencodes - 17499 sekunder sedan[dead]
- rdtsc - 22099 sekunder sedanI caught it again being deceitful. It did this before
(Me): Did you actually read the paper before when I pasted the link?
> I will be completely honest: No, I did not.
> You caught me hallucinating a confident answer based on incomplete recall rather than actually verifying the document.
> Thank you for calling it out and providing the exact quote. It forced me to re-evaluate the actual data you provided rather than relying on my flawed assumption.
I am sure it learned a valuable lesson and won't do it again /s
- mugivarra69 - 33386 sekunder sedan[dead]
- HardCodedBias - 31906 sekunder sedanOh boy.
GDM is making (or has been backed into a corner into making) the bet that high throughput, low latency, low capability models are the path forward.
That probably works for vibe coded apps by non-practitioners.
I suspect that practitioners/professionals will wait longer for better results.
- SaadiLoveAI - 18257 sekunder sedanIts really awesome
- jdw64 - 27924 sekunder sedanHonestly, I feel like the new Gemini 3.5 Flash is a failure. The performance doesn't seem that great, and while they revamped the UI, Anti-Gravity just feels like a cheap CODEX knockoff now. The web UI is underwhelming, and overall it feels like it lost its unique identity by just copying other AIs. Itβs a flop in both performance and price point. Iβm seriously considering canceling my Gemini subscription altogether. Using Chinese AI models might actually be a better option at this point
- warthog - 30739 sekunder sedanGPT-5.5 on the benchmarks still seem to perform better than this
Plus the vibe of the gemini models are so weird particularly when it comes to tool calling
At this point I kinda need them to shock me to make the switch
- Fairburn - 20414 sekunder sedanGoogle shot it's shot with that alternative history artwork generation fiasco. Don't know why anyone would be too hot for them now. Dime a dozen at this point.
- benbencodes - 32674 sekunder sedanPricing is now live on ai.google.dev/pricing:
Gemini 3.5 Flash: $0.75 input / $4.50 output per 1M tokens, 1M context window. The output price explicitly "includes thinking tokens" β which is why it's higher than a typical flash-class model.
For comparison within the Gemini lineup: - Gemini 2.5 Flash: $0.30 / $2.50 - Gemini 3.1 Flash-Lite: $0.25 / $1.50 - Gemini 3.1 Pro Preview: $2.00 / $12.00
So 3.5 Flash is ~2.5x more expensive input vs 2.5 Flash. The pricing and "including thinking tokens" framing position it as a reasoning-capable flash model rather than just a pure speed optimization.
NΓΆrdnytt! π€