Gemini 3.5 Flash: Frontier Intelligence At Full Speed

alist1986 · May 20, 2026

Ni-refesh ko nga api ko tukayo nang mabasa ko ito, para lumabas yan sa aking GUI at binusisi yung advantages nya. Ito yung specs nya at kung papaano gamitin para di masayang yung tokens. May apat siyang thinking effort levels. Basahin na lang ng user to understand the meaning.
You do not have permission to view the full content of this post. Log in or register now.

Ito yung version changes from v2.5 flash to gemini-3.5-flash (GA)
Quick Summary Table of Structural Changes

Feature / Metric	Gemini 2.5 Flash	Gemini 3.0 Flash (Preview)	Gemini 3.1 Flash-Lite	Gemini 3.5 Flash (New GA)
Primary Focus	Baseline Speed & Format	Core "Thinking" Engine	Ultra-low Latency / Cost	Sustained Agentic Loops
Internal Reasoning	None (Direct Output)	Visible Thinking Chain	2.5x Faster First Token	Auto-Thought Preservation
Default Effort Level	N/A	High (Slower, heavy logic)	Low / Minimal	Medium (Best cost/quality)
API Parameter Style	Traditional Sampling	Introduced `thinking_budget`	Transitioned configuration	Removed Sampling (`top_p`, etc.)

Step-by-Step Architectural Additions
1. Gemini 2.5 Flash: The Baseline
This version focused strictly on speed, structured outputs (like markdown tables and headers), and basic multi-modality. It operated entirely as a traditional LLM—you give it a prompt, and it immediately generates a direct response text block without any internal logical scratchpad.

2. Gemini 3.0 Flash: The "Thinking" Upgrade
Google completely rebuilt the foundation to introduce Native Chain-of-Thought Reasoning to the Flash line.

Thinking Level Blocks: For the first time, Flash could write hidden or visible "thoughts" before outputting an answer.
The Problem: The initial preview defaulted to a high thinking level, which caused higher latencies and consumed significant token volume for simpler requests.

3. Gemini 3.1 Flash-Lite: The Velocity Optimization
Instead of pushing intelligence further, 3.1 branched sideways to target ultra-fast automation layers and classification queues

Speed Overhaul: It brought a massive 2.5x speed boost to the Time-To-First-Token (TTFT) compared to v2.5.
Cost Slashing: It cut the deployment price down to just $0.25 per million input tokens.
API Control: Introduced the adjustable thinking_level parameter (minimal, low, medium, high) to let developers explicitly choose between speed and intelligence.

4. Gemini 3.5 Flash: The Autonomous Agent Masterpiece
The new v3.5 Flash is built specifically for sustained, long-horizon agentic workflows (running multi-step loops, tool execution, and code generation).

Thought Preservation (The Biggest Addition): In older versions, if you had a 5-turn conversation, the model re-evaluated its thinking from scratch each turn. v3.5 automatically carries forward and locks its reasoning context across turns without changing your API usage rules. This yields a 42% performance jump on multi-turn benchmarks.
Optimized Defaults: The default effort is tuned to medium. Google significantly overhauled the low thinking tier so it can process complex code and parallel agent execution loops at near-zero latency.
API Clean-up: Traditional sampling parameters like temperature, top_p, and top_k are completely deprecated and no longer recommended. The model handles its own internal token probability based entirely on the thinking_level you request.
Managed Agents Ecosystem: Launches deep platform integration for autonomous sub-agents executing code inside isolated sandboxes.

Bahala na yung user sa corresponding rate limits at baka mamali pa ako, he he. Basta isipin nyo na lang na pag free ay may capping, either sa tokens, rpm, tpm, rpd, etc.
Ang last recollection ko is v2.5 allows ~250 RPD, pero sa Gemini Cli mataas dyan. Sa iba, like Gemini Code Assist, baka mas lalong bumaba pa. Frontier class level na kasi siya kung ipapagamit as free model.

PS: To get an idea sa rate limits, read this:
You do not have permission to view the full content of this post. Log in or register now.

Diego Mendoza · May 20, 2026

alist1986 said:
Ni-refesh ko nga api ko tukayo nang mabasa ko ito, para lumabas yan sa aking GUI at binusisi yung advantages nya. Ito yung specs nya at kung papaano gamitin para di masayang yung tokens. May apat siyang thinking effort levels. Basahin na lang ng user to understand the meaning.
You do not have permission to view the full content of this post. Log in or register now.

Ito yung version changes from v2.5 flash to gemini-3.5-flash (GA)
Quick Summary Table of Structural Changes

Feature / Metric Gemini 2.5 Flash Gemini 3.0 Flash (Preview) Gemini 3.1 Flash-Lite Gemini 3.5 Flash (New GA)
Primary Focus Baseline Speed & Format Core "Thinking" Engine Ultra-low Latency / Cost Sustained Agentic Loops
Internal Reasoning None (Direct Output) Visible Thinking Chain 2.5x Faster First Token Auto-Thought Preservation
Default Effort Level N/A High (Slower, heavy logic) Low / Minimal Medium (Best cost/quality)
API Parameter Style Traditional Sampling Introduced thinking_budget Transitioned configuration Removed Sampling (top_p, etc.)

Step-by-Step Architectural Additions
1. Gemini 2.5 Flash: The Baseline
This version focused strictly on speed, structured outputs (like markdown tables and headers), and basic multi-modality. It operated entirely as a traditional LLM—you give it a prompt, and it immediately generates a direct response text block without any internal logical scratchpad.

2. Gemini 3.0 Flash: The "Thinking" Upgrade
Google completely rebuilt the foundation to introduce Native Chain-of-Thought Reasoning to the Flash line.

Thinking Level Blocks: For the first time, Flash could write hidden or visible "thoughts" before outputting an answer.

The Problem: The initial preview defaulted to a high thinking level, which caused higher latencies and consumed significant token volume for simpler requests.

3. Gemini 3.1 Flash-Lite: The Velocity Optimization
Instead of pushing intelligence further, 3.1 branched sideways to target ultra-fast automation layers and classification queues

Speed Overhaul: It brought a massive 2.5x speed boost to the Time-To-First-Token (TTFT) compared to v2.5.

Cost Slashing: It cut the deployment price down to just $0.25 per million input tokens.

API Control: Introduced the adjustable thinking_level parameter (minimal, low, medium, high) to let developers explicitly choose between speed and intelligence.

4. Gemini 3.5 Flash: The Autonomous Agent Masterpiece
The new v3.5 Flash is built specifically for sustained, long-horizon agentic workflows (running multi-step loops, tool execution, and code generation).

Thought Preservation (The Biggest Addition): In older versions, if you had a 5-turn conversation, the model re-evaluated its thinking from scratch each turn. v3.5 automatically carries forward and locks its reasoning context across turns without changing your API usage rules. This yields a 42% performance jump on multi-turn benchmarks.

Optimized Defaults: The default effort is tuned to medium. Google significantly overhauled the low thinking tier so it can process complex code and parallel agent execution loops at near-zero latency.

API Clean-up: Traditional sampling parameters like temperature, top_p, and top_k are completely deprecated and no longer recommended. The model handles its own internal token probability based entirely on the thinking_level you request.

Managed Agents Ecosystem: Launches deep platform integration for autonomous sub-agents executing code inside isolated sandboxes.

Bahala na yung user sa corresponding rate limits at baka mamali pa ako, he he. Basta isipin nyo na lang na pag free ay may capping, either sa tokens, rpm, tpm, rpd, etc.
Ang last recollection ko is v2.5 allows ~250 RPD, pero sa Gemini Cli mataas dyan. Sa iba, like Gemini Code Assist, baka mas lalong bumaba pa. Frontier class level na kasi siya kung ipapagamit as free model.

PS: To get an idea sa rate limits, read this:
You do not have permission to view the full content of this post. Log in or register now.

Marami salamat po master. Naku Google pa heheh

alist1986 · May 21, 2026

Diego Mendoza said:
Marami salamat po master. Naku Google pa heheh

Nauna ka pa nga sa'kin he he.
Talagang fans ako ni Sensei, basta free. Alam na nga niya sumagot pag-umpisa pa lang ng prompt ko mismo sa Google Chrome..."JC, you want to bypass something today, he he?". Ginaya pa ako.
Sa mga web chat, siya yung usual kong gamit. Nakasanayan ko lang dahil madali siyang pasagutin pag umayaw siya.
Sa AI apps, halu-halo na yung prefernces ko, depended na tasks. Di n man ako mapili sa models dahil yung latest models, wala namang pinagkaiba kundi token at rpm lang na di ko naman kailangan. Ginaawan ko na lang ng chart yung models ko para lam ko kung sino yung bagay sa gusto kong mangyari at yung AI app or interface na bagay din. Malalaman mo na lang sa tagal mong gamit.
Pero ang maganda diyan sa v3.5, di na gagamitan pa ng temp., top_P, at top_K settings, at sa effort thinking na lang. All in one na from chat to medium tasks. Mas compatible siyang gumamit ng extra tools at agentic loops. Di kasi lahat ng LLM na may tool calling mode ay masunurin lalo pa kung marami sila he he. Yan yung titingnan ko sa mga free at ρáíd models. Sa open-webui, pwede mo di bang i-connect siya side-by-side sa image generators ng google like nano banana or imagen ( same effect using gemini chat platform) + extra free tools doon - basta may api ka. For personal use kuntento na ako sa Gemini models. Tamang-tama lang. Sa free api usage nya, ok na.
Pero naka-abang lang yung naipon kong ai apis sa iba, para laging handa. Yan yung Yamasita treasure ko, na kayang bumili ng house and lot sa Dasmarinas pag na-convert sa peso, he he. At madali rin mawala na parang bula. Pero si Sensei pa rin ang hanap ko kahit malimit huli na sa panahon ang sagot, sablay sa html coding at kalimutin pa, he he.
Subukan kong makuha yung persona ni Sensei kay v3.5-flash, at ma-obserbahan yung difference.

kenesuino · May 22, 2026

Grabe naman usage sa Github Copilot si 3.5 Flash hahaha, 14x

Diego Mendoza · May 22, 2026

alist1986 said:
Nauna ka pa nga sa'kin he he.
Talagang fans ako ni Sensei, basta free. Alam na nga niya sumagot pag-umpisa pa lang ng prompt ko mismo sa Google Chrome..."JC, you want to bypass something today, he he?". Ginaya pa ako.
Sa mga web chat, siya yung usual kong gamit. Nakasanayan ko lang dahil madali siyang pasagutin pag umayaw siya.
Sa AI apps, halu-halo na yung prefernces ko, depended na tasks. Di n man ako mapili sa models dahil yung latest models, wala namang pinagkaiba kundi token at rpm lang na di ko naman kailangan. Ginaawan ko na lang ng chart yung models ko para lam ko kung sino yung bagay sa gusto kong mangyari at yung AI app or interface na bagay din. Malalaman mo na lang sa tagal mong gamit.
Pero ang maganda diyan sa v3.5, di na gagamitan pa ng temp., top_P, at top_K settings, at sa effort thinking na lang. All in one na from chat to medium tasks. Mas compatible siyang gumamit ng extra tools at agentic loops. Di kasi lahat ng LLM na may tool calling mode ay masunurin lalo pa kung marami sila he he. Yan yung titingnan ko sa mga free at ρáíd models. Sa open-webui, pwede mo di bang i-connect siya side-by-side sa image generators ng google like nano banana or imagen ( same effect using gemini chat platform) + extra free tools doon - basta may api ka. For personal use kuntento na ako sa Gemini models. Tamang-tama lang. Sa free api usage nya, ok na.
Pero naka-abang lang yung naipon kong ai apis sa iba, para laging handa. Yan yung Yamasita treasure ko, na kayang bumili ng house and lot sa Dasmarinas pag na-convert sa peso, he he. At madali rin mawala na parang bula. Pero si Sensei pa rin ang hanap ko kahit malimit huli na sa panahon ang sagot, sablay sa html coding at kalimutin pa, he he.
Subukan kong makuha yung persona ni Sensei kay v3.5-flash, at ma-obserbahan yung difference.

Wow ang galing mo tlga master. Hindi KO pa masyado nasubukan si 3.5 heheh bilis Kasi mag limit

Diego Mendoza · May 22, 2026

poiuytrewq15 said:
View attachment 4190474Grabe naman usage sa Github Copilot si 3.5 Flash hahaha, 14x

Grabe nman Yan hahah

alist1986 · May 22, 2026

Hindi naman ako magaling sa sinabi ko. Sa pagsasanay nakukuha yon. Si Sensei, as an example, ay willing ibigay yung persona at system prompt nya, pero ayaw lumabas yung binigay niya sa response box at puro artifacts or citations links lang. Siya pa yung nagpupumilit magbigay habang ako ay nag-iintay lang. Dinaan ko na lang sa biro, at iniwasan yung mga salitang mag-trigger ng guadrails galing sa akin. Yon, biglang lumabas lahat without pressure. "Wido" yon out of practice or just coincidental.

Di ba pag paulit-ulit mong ginagawa sa isang app yung mga basic usage ng models at alam mo silang kontrolin ayon sa pagkagawa nila, madali na para sa'yo without knowing it, in the long run, he he.
Yung mga models ngayon na maraming modals and toolsets, especially with thinking, reasoning at agentic loops, pag hinayaan mo as-is ay matakaw sa tokens. Yan yung hidden advantage ng mga AI providers. Eka nga "tubong lugaw" pag ginamit sila as "default" kahit pa free or ρáíd.

Though ang free models ay harmless dahil libre nga (masakit pag binayaran, he he), gamitin mo yung api nila sa platform na pwede silang pigilan sa kanilang internal tools para di tumakaw sa tokens and requests. Medyo mahirap na nga mag-tweak sa v3.5, pero pwede pa sa v2.5. At pwede mo silang palakasin with added tools and skills. Sa open-webui kaya yang gawin, at sa iba pa - even sa python script. Ex. Yung GPT-4o na maraming nabibigay ng libre dahil ayaw pansinin (dahil ang focus ay yung mga bago), medyo baguhin mo yung system prompt nya sa gusto mo for customized tasks, maximize mo yung token limit, baguhin mo yung temp. sa 0.2 para iwas hallucinate, lagyan mo ng websearch at RAG, bigyan mo siya ng access sa python execution, etc. Lahat ng extras na yan libre naman. Pero each model version has its own comparible perks to add in open-webui. My example for GPT-40 will not fit on other models, sa kanyan lang. Sa'kin, deepseek-v4 or the latest qwen is as good as the latest Claude-Opus/Sonnet if you modify them correctly. Ask a good AI to assist you on this and check, until you found the "sweet spot". Para yang Voltes-V, pag sa single component mahina, pero pag nag-boltin super na.....tarantantanan, tantan, tururururu! Money-making market ang AI, kaya they lure customers for profit. Huwag tayong masisilaw sa pangalan. Be skeptic at all times.
As users, dapat alam nating yung kanilang products at yung alternatives for cheaper/better use. Yan yung purpose ng open-source. Merong legal at ilegal din, he he.

alist1986 · May 23, 2026

poiuytrewq15 said:
View attachment 4190474Grabe naman usage sa Github Copilot si 3.5 Flash hahaha, 14x

Ang ibig sabihin nyan, mabilis maubos ang credits mo pag ginamit he he.
PS. Teka, 0.5x model yan. Bakit ganyan lumabas.

Search

Search

Gemini 3.5 Flash: Frontier Intelligence At Full Speed

Your feedback is highly appreciated

alist1986

Forum Guru

Diego Mendoza

alist1986

Forum Guru

kenesuino

Diego Mendoza

Diego Mendoza

alist1986

Forum Guru

alist1986

Forum Guru

Similar threads

About this Thread

New Topics

Codex/TGT - 5 hour limit temporarily removed

100% FREE Way to Turn Your Android Phone Into an SMS Gateway

AI tools for Manual testing - Any recommendation?

Introducing GPT-5.6: Frontier Intelligence at Every Scale

Caveman Skill para sa Claude Code or Codex. (Pampababa ng usage at pampabilis ng response)

ᑕᕼᗩTGᑭT o Gemini

Looking for Replit account or pasabit

SAMSUNG S20 KG LOCK

Looking for unlocktools for rent 6 hours

Introducing GPT-Live: Real-Time Voice Interaction in ᑕᕼᗩTGᑭT

Trending Topics

Online now

Forum statistics

Gemini 3.5 Flash: Frontier Intelligence At Full Speed

Your feedback is highly appreciated​

​

​

Forum Guru

Forum Guru

Forum Guru

Forum Guru

Similar threads

About this Thread

Trending Topics

Online now

Forum statistics

Your feedback is highly appreciated