👨‍🏫 Tutorial Microsoft just dropped a open-source voice AI that actually slaps ElevenLabs and its 100% FREE

thank you malaking tulong for narrations
Ang bonus ko sa mga di pa alam yung free tts na medyo katanggap-tanggap gamitin ay yung gemini-cli o gemini-cli-tts sa github (as one example) or just use aistudio ng google. Decent ang free tier; ok ang daily rate limits compared sa openAI, Grok or Claude. Mas lightweight kaysa mag-paandar ka locally. Yung TTS nyan sa lite model ay aabot ng 1000 rpd sa limit na 10 minutes (max)/requests - standard is 250 rpd. Everyday yan on a single account. Ang maganda dyan sa gwmini tts models, nakakaintindi na sila ng ssml breaks or pauses, prosdies, and emotional/E×ρréššive markers. I guide nyo na lang using its llm service gamit yung text samples ninyo. Experiment nyo na lang kung kakayanin ng free models yung Pause and Prosody Controls dahil nagloloko yan pag sobrang dami na ng tags sa long text like the flash models. Same din yan sa models ng Openai TTS, atbp. kung nakagamit na kayo. Kaya mas mainam na i-chunk ninyo yung text para di mabulunan yung TTS model. Kahit vibevoice ganyan din.
 
Ang bonus ko sa mga di pa alam yung free tts na medyo katanggap-tanggap gamitin ay yung gemini-cli o gemini-cli-tts sa github (as one example) or just use aistudio ng google. Decent ang free tier; ok ang daily rate limits compared sa openAI, Grok or Claude. Mas lightweight kaysa mag-paandar ka locally. Yung TTS nyan sa lite model ay aabot ng 1000 rpd sa limit na 10 minutes (max)/requests - standard is 250 rpd. Everyday yan on a single account. Ang maganda dyan sa gwmini tts models, nakakaintindi na sila ng ssml breaks or pauses, prosdies, and emotional/E×ρréššive markers. I guide nyo na lang using its llm service gamit yung text samples ninyo. Experiment nyo na lang kung kakayanin ng free models yung Pause and Prosody Controls dahil nagloloko yan pag sobrang dami na ng tags sa long text like the flash models. Same din yan sa models ng Openai TTS, atbp. kung nakagamit na kayo. Kaya mas mainam na i-chunk ninyo yung text para di mabulunan yung TTS model. Kahit vibevoice ganyan din.
thank you for this, dahil sayo i learned ssml, it works sa gemini ai, saya, i command maglagay ng "sigh" para tao talaga , paghinga sa pag babasa.
 
thank you for this, dahil sayo i learned ssml, it works sa gemini ai, saya, i command maglagay ng "sigh" para tao talaga , paghinga sa pag babasa.
Good to know you made it he he. Actually, yang Gemini ang medyo "in" sa trends ng filipino grammar. Yan lang sulit na.
Kung dyan lang nasiyahan ka, mas ok din kung i-train mo yung LLM like Gemini for language pre-processing-(optional) professional approach. Mas ok yung free Pro models kahit yung lumang 1.5 Pro/2.5-pro for this purpose as free credits are acceptable when needed. Gagamit ka ng RAG app like Anything LLM with knowledgebase w/ free gemini embedding model api or BGE-M3 (offline model). Yung knowledgebase consists of tagalog-english language dictionary, tagalog slang dictionary, tagalog grammar guide, sample conversational texts, atbp. (since tagalog is often used mixed with english or taglish).....Mas makakatulong yan to create close to perfect filipino grammar bago isalang sa TTS (or cloning +TTS). This is for reference only in case someone tries the idea on his/her experiments! Personal project ko ito to train my AI to speak like me, he he.

Tama yung ginawa mo. You used the most important part. Yung pag-control ng conversational text using prompts, tulad nitong examples:

1. For LLM pre-processing of text:

Code:
"Rewrite this Tagalog text into a TTS script using these tags: [br] for a short breath, [p:ms] for pauses in milliseconds, and <u> for stressed syllables. Ensure the rhythm sounds like a natural conversation, not a reading."

2. For TTS:

Code:
"Speak as a native Tagalog speaker from Manila. Use a conversational, non-robotic tone. Ensure clear distinction between 'o' and 'u' sounds. Maintain a slight breathiness at the end of sentences to sound more human."

Meron din extra option pag meron kang gustong dialogue (audio) at gusto mong gayahin. convert mo muna sa timestamped script. Usually, Edge-TTS does the trick or sa online. Then, I-prompt mo yung Gemini:

Code:
"Convert this timestamped transcript into a Gemini-compatible SSML script. Calculate <break> times based on the gaps between words. Use <prosody> and <emphasis> tags to reflect the 'excited' sentiment noted in the transcript."

Ok itong gamtin sa mga TV series.

Eka nga, use your imagination!Meron ka namang ai assistant to help you analyze the text and give advice to handle your conversational text. The rest is up to the AI to follow till you get your preferred results.
 
Good to know you made it he he. Actually, yang Gemini ang medyo "in" sa trends ng filipino grammar. Yan lang sulit na.
Kung dyan lang nasiyahan ka, mas ok din kung i-train mo yung LLM like Gemini for language pre-processing-(optional) professional approach. Mas ok yung free Pro models kahit yung lumang 1.5 Pro/2.5-pro for this purpose as free credits are acceptable when needed. Gagamit ka ng RAG app like Anything LLM with knowledgebase w/ free gemini embedding model api or BGE-M3 (offline model). Yung knowledgebase consists of tagalog-english language dictionary, tagalog slang dictionary, tagalog grammar guide, sample conversational texts, atbp. (since tagalog is often used mixed with english or taglish).....Mas makakatulong yan to create close to perfect filipino grammar bago isalang sa TTS (or cloning +TTS). This is for reference only in case someone tries the idea on his/her experiments! Personal project ko ito to train my AI to speak like me, he he.

Tama yung ginawa mo. You used the most important part. Yung pag-control ng conversational text using prompts, tulad nitong examples:

1. For LLM pre-processing of text:

Code:
"Rewrite this Tagalog text into a TTS script using these tags: [br] for a short breath, [p:ms] for pauses in milliseconds, and <u> for stressed syllables. Ensure the rhythm sounds like a natural conversation, not a reading."

2. For TTS:

Code:
"Speak as a native Tagalog speaker from Manila. Use a conversational, non-robotic tone. Ensure clear distinction between 'o' and 'u' sounds. Maintain a slight breathiness at the end of sentences to sound more human."

Meron din extra option pag meron kang gustong dialogue (audio) at gusto mong gayahin. convert mo muna sa timestamped script. Usually, Edge-TTS does the trick or sa online. Then, I-prompt mo yung Gemini:

Code:
"Convert this timestamped transcript into a Gemini-compatible SSML script. Calculate <break> times based on the gaps between words. Use <prosody> and <emphasis> tags to reflect the 'excited' sentiment noted in the transcript."

Ok itong gamtin sa mga TV series.

Eka nga, use your imagination!Meron ka namang ai assistant to help you analyze the text and give advice to handle your conversational text. The rest is up to the AI to follow till you get your preferred results.
i bookmark this, since nag change niche ako, mababa views sa yt ko sa monatized channel ko na tagalog, english narrative ang ginagawa ko, which okay dahil ssml lang, for future reference gawin ko ang suggestion mo, thank you uli , nahihirapan nga ang gemini sa tagalog na "TUMANGO" lol
 
i bookmark this, since nag change niche ako, mababa views sa yt ko sa monatized channel ko na tagalog, english narrative ang ginagawa ko, which okay dahil ssml lang, for future reference gawin ko ang suggestion mo, thank you uli , nahihirapan nga ang gemini sa tagalog na "TUMANGO" lol
Di ako YTber, tagapanood lang pag merong makabuluhan. Talagang gapang ka sa Tagalog pag AI. Hanggang pwedeng pagtiyagaan pa lang. Elevenlabs, Fish Audio, Murf, ay Gemini ang top 4. Kahit OpenAI bagsak sa Tagalog TTS. Maganda kung may ρáíd service ka sa 11Labs dahil yan lang ang maraming pagpipilian at ma-customize mo ng maayos. O magtiyaga ka sa 10000 characters/day to see the difference. Nasubukan mo na ba sa Gemini 2.5 Pro TTS sa Google Cloud TTS Free Tier? Libre naman yan to test sa initial $300 credit.
Sa english nag-excel halos lahat kaya wala kang problema basta kaya yung emotions, E×ρréššions and pauses. To make it unique, clone with a perfect voice, with the best sounding TTS engine. Trial and error na sa Tagalog with cloned audio or with audio tags + a prompt as your cloning sample director. Di kasi lahat ng TTS may support sa ssml, pause/emotion /E×ρréššion tags o may app na may voice cloning on the side. Maghahanap ka ng app na makakasunod sa gusto mo. Kailangan mo rin ng AI audio/speech enhancement tools, depende sa tipo mong tunog. Sa cloning naman, yung latest free RVC app ang kalimitang gamit online/offline (basta ang pipiliin mong model filtered sa RVC2 and RVMPE na mga 300 -500 epoch). Maraming Tagalog voices para dyan or use your own voice. Huwag ka lang mag-clone ng pangit na audio he he at same din labas nyan.
 
Gemini 2.50 Pro gamit ko free tier, using 5 emails everyday, because i need 150mins a day, 75mins a morning, 75mins a noon, i afraid baka mahal sa 11labs ganyan kadami ng minutes a day.
 
Gemini 2.50 Pro gamit ko free tier, using 5 emails everyday, because i need 150mins a day, 75mins a morning, 75mins a noon, i afraid baka mahal sa 11labs ganyan kadami ng minutes a day.
Pag ganyan, AI speech enhancement tools na lang pala kailangan mo. Mahal nga yang 11 labs he he. Kailangan mo ng 15 accounts para mabuo yung 150 minutes.
Di ko pa nasubukang mag-stack ng api credits sa TTS . Sa LLMs kahit daang libo, pwede pa he he.
May nalaman akong trick sa net na meron ako sa desktop. Parang totoo nga pero nabagalan ako't local server. Pang-backup kung kailangan. Sakop nya rin yung 11 labs. PM ko na lang sa'yo kung interisado ka.

PS. Forget it. Akala ko may unlimited effect kung i-clone ko yung online web server sa desktop at alisin yung restrictions and/or create sub-accounts. Parehas din pala using multiple accounts with rotating proxies online na ayaw kong gamitin. Parehas din ng ginawa mp sa Gemini. Privacy lang yung advantage.
At least may natutunan akong bagong method kahit pabalang he he. Pwede rin automated.
 
Pag ganyan, AI speech enhancement tools na lang pala kailangan mo. Mahal nga yang 11 labs he he. Kailangan mo ng 15 accounts para mabuo yung 150 minutes.
Di ko pa nasubukang mag-stack ng api credits sa TTS . Sa LLMs kahit daang libo, pwede pa he he.
May nalaman akong trick sa net na meron ako sa desktop. Parang totoo nga pero nabagalan ako't local server. Pang-backup kung kailangan. Sakop nya rin yung 11 labs. PM ko na lang sa'yo kung interisado ka.
noob talaga ako, what is LLMs ? i have laptop mababa lang spec 5i 8 gen, 8gb, pwede ba , , thank you.
 
noob talaga ako, what is LLMs ? i have laptop mababa lang spec 5i 8 gen, 8gb, pwede ba , , thank you.
Naku, kakagising ko lang at napiga utak ko dyan sa special project habang may alaga sa tabi na pinuyat din ako he he. Yung tinutukoy ko yung puter.com na may multi-AI support. Pwede i-clone yan sa desktop - not recommended sa purpose natin. Gawa ka na lang ng multiple accounts dyan to test premium 11labs AI models. Sabay mo sa 11labs to increase credits Di yan unlimited na marketed sa net, pero maka-test ka rin for a number of requests sa premium models hanggang mag-429 error. Para rin yang account sa Groq and Google, pag mababang models, mataas yung free daily requests, sa premium baka ilan lang tapos na. Pay as you go kasi yan. Sinabi ko yung reason sa last (PS) post. Walang foolproof method dahil kahit pasok ka sa puter.com, ma-flag ka rin ng AI providers pag lumampas ka sa limit sa dami ng kanilang counter-measures sa mga tulad natin he he. Minsan makukuha mo sa incognito o personal browser profile, pero one mistake might block all. Kaya nag-switch ako sa external providers the legal way at bundled sila as one api para easy to maintain. Kaunti lang yung TTS apis pero oras yung bibilangin for a limited time period.

(LLM stands for large language models like the GPTs of OpenAI, Gemini models, etc. specifically for text generation like chats.)

OK naman yung pc mo for local ai, pero 8GB is small kung Win10/11 gamit mo. Sa initialization ng windows at browser baka mangalahati na yan. DDR4 naman yan, so check your hardware kung may internal ram ba yan na may single slot or wala na may 2 slots. Better add more. Pataasin mo yung RAM. Kung may budget ka make it 16 -32GB at the least. Mura lang ang RAM sa Shopee/Lazada, pero huwag sa mall he he. Mahal doon. 8th gen is just 2 real cpu (4 threads) if it is a laptop. Sa desktop, yung i5 is 4 real cores. May dedicated GPU ba yan or just the Intel GPU - baka Intel UHD 620 o 630. Nakadepende yung vram ng gpu na yan sa RAM mo dahil "Shared Video Memory" ang gamit nyan. Usually ~50% ng "existing" system ram ang pwedeng gamitin nyan. So nanghihiram siya sa ram mo! Since sa web naman ang gamit mo sa AI, kaya pa rin nyan yung AI projects mo. Using local aI, medyo mapili yan depende sa laki ng models na gagamitin mo. Mag-aral kang magpatakbo ng python scripts muna at js scripts using node.js. Medyo mahirap sa umpisa dahil gagamit ka ng cli commands sa command prompt ng windows o powershell. No need for coding at minor editing lang ng inputs ang kailangan. Sa github, meron din namang executable releases for ease. Ang iintindihin mo lang kung kailan magbura ng cache nila o working folder pag hindi na usable to start fresh and save your remaining storage sa hardisk. Yan yung problema ko ngayon. Lagas yung 50GB sa 3 day testing ko. Magigising ako <1GB na yung HD.

Pag nag-upgrade ka na balikan mo ako. Pero pag-isipan ko yung pwede dyan sa existing HW mo o find an api alternative for your TTS project. Ang key kasi sa low specs pc to run locally using cpu only with faster speed is to convert original models to ONNX (quantized) models and find a GUI (preferably C++ made) that can run it or make one. Mahirap ngayon dahil high end-GPU ang supported ng AIs with minor support sa cpu. Kaya sa web na lang karamihan ang gumagamit to avoid the burden. So you have to pay or find free access.
 
PS. sorry kung medyo hindi angkop yung iba kong sinasabi sa existing knowledge mo. Kahit ako nag-aaral pa rin dahil sa dami ng audio processing including TTS. Mas gusto ko yung source ng AI hindi yung final product para lagi akong may alternative when it fails. Basta malaman ko lang kung paano sila ginawa, sapat na yon. Sa'yo mas ok yung web platforms muna to increase your productivity rather than spend more time on harder methods - maybe later. Quality kasi hanap mo sa TTS na makukuha mo online. Yung local vibevoice ay ok naman sa English. Isalang mo sa sa Adobe Podcast Enhance na free online or sa Audacity mas magandang pakinggan. Subukan mo muna sa "Spaces" ng HF link sa baba:
You do not have permission to view the full content of this post. Log in or register now.
Buuin ko na para isang latag:
Fast model yan di tulad noong stanfards models na 1.5B at 7B oarameters. Ito lang yung kaya nya dahil designed siya for real time chat so depende yung E×ρréššions and emotions sa text na babasahin nya.
1. Punctuation Tricks for the 0.5B Model
The 0.5B model is a "context-sensitive" generator. It looks at the rhythm and punctuation of your sentence to decide its tone.
Level Up Coding +1
  • Artificial Pauses:
    • Ellipses (...): Best for a trailing, hesitant, or thoughtful pause.
    • Double Dashes (--): Creates a sharp, abrupt break in a sentence.
    • Multiple Commas (, ,): A community häçk to force a slight "breath" between words without ending the sentence.
  • Emotion Shifting:
    • Exclamation Overload (!!!): Can trigger a higher pitch or more urgent delivery.
    • Question Marks (?): Naturally creates the "rising intonation" at the end of phrases.
    • Descriptive Lead-ins: Start your text with a mood-setter. Instead of just "Hello," try "She said excitedly, 'Hello!'" The model often carries the "excited" tone into the actual dialogue.
  • Speed Control:
    • Line Breaks: To slow down a fast talker, break your text into smaller paragraphs. The model treats each new block as a "fresh start," adding a natural reset and brief silence.
Kung gusto mong i-support ng vibevoice yung marker tags, gamitin mo yung You do not have permission to view the full content of this post. Log in or register now.na hardcoded sa 1.5B at 7B models. Auto-install lang yan yung the bat script at pwede sa cpu. Maghanap ka Q4 GGUF model nyan para di mnghingalo pc mo. Designed yan para dyan na wala sa original site na gamit yung gradio webui. Yung isa You do not have permission to view the full content of this post. Log in or register now. na kailangan mo pang ma-instali ng buong Comfy GUI para lang dyan.

Kung gusto mong mag-clone ng mabilis without training, find a service with zero shot cloning. Kung quality ang hanap mo, find a service with RVC2 bago mo isalang sa TTS as your base audio. Bukod dyan, maghanap ka rin ng service na may prompt on the side to assist. Karamihan dyan, upload audio then type the conversational text lang. Consider all possibilities kung gusto mo ng long or short method, but be sure doon ka lumagay sa alam mo muna at comfortable ka at di mauubusan ng requests.
 
PS. sorry kung medyo hindi angkop yung iba kong sinasabi sa existing knowledge mo. Kahit ako nag-aaral pa rin dahil sa dami ng audio processing including TTS. Mas gusto ko yung source ng AI hindi yung final product para lagi akong may alternative when it fails. Basta malaman ko lang kung paano sila ginawa, sapat na yon. Sa'yo mas ok yung web platforms muna to increase your productivity rather than spend more time on harder methods - maybe later. Quality kasi hanap mo sa TTS na makukuha mo online. Yung local vibevoice ay ok naman sa English. Isalang mo sa sa Adobe Podcast Enhance na free online or sa Audacity mas magandang pakinggan. Subukan mo muna sa "Spaces" ng HF link sa baba:
You do not have permission to view the full content of this post. Log in or register now.
Buuin ko na para isang latag:
Fast model yan di tulad noong stanfards models na 1.5B at 7B oarameters. Ito lang yung kaya nya dahil designed siya for real time chat so depende yung E×ρréššions and emotions sa text na babasahin nya.

Kung gusto mong i-support ng vibevoice yung marker tags, gamitin mo yung You do not have permission to view the full content of this post. Log in or register now.na hardcoded sa 1.5B at 7B models. Auto-install lang yan yung the bat script at pwede sa cpu. Maghanap ka Q4 GGUF model nyan para di mnghingalo pc mo. Designed yan para dyan na wala sa original site na gamit yung gradio webui. Yung isa You do not have permission to view the full content of this post. Log in or register now. na kailangan mo pang ma-instali ng buong Comfy GUI para lang dyan.

Kung gusto mong mag-clone ng mabilis without training, find a service with zero shot cloning. Kung quality ang hanap mo, find a service with RVC2 bago mo isalang sa TTS as your base audio. Bukod dyan, maghanap ka rin ng service na may prompt on the side to assist. Karamihan dyan, upload audio then type the conversational text lang. Consider all possibilities kung gusto mo ng long or short method, but be sure doon ka lumagay sa alam mo muna at comfortable ka at di mauubusan ng requests.
thank you sa lahat ng knowledge about this, try ko ma sink sa brain ko, sana pala about sa computer kinuha ko, hindi ng business admin lol again , thank you,
 
thank you sa lahat ng knowledge about this, try ko ma sink sa brain ko, sana pala about sa computer kinuha ko, hindi ng business admin lol again , thank you,
No problem. Mag-trial and error ka muna sa pc mo para dyan. May mga guides naman using python and nodes.js. Ang pakay mo muna ay kung paano sila gamitin, hindi sa coding. Mas madali yon. Nasa user na kung gaano sila kabilis matuto on their own. Natuto lang ako sa bahay - lakasan ng loob he he.

Sa RAM ka na lang bumawi at mas mura yon kaysa new pc. Ako may pambili na ng bago pero na-divert pa sa iba, kaya nagtiyaga ako sa existing 14 yrs old na pc na naka-dextrose mode na. May 8th gen i5 din ako sa bahay na Lenovo, na upgraded na sa 20GB pero di ko pa na-setup sa AI. One slot lang dahil may internal ram na 4GB.

Pm kita pag may madaling idea akong maisip na ilang clicks lang.
 
Sana mga sir.. May makagawa dito na mging apps yan at ready to use na.. Tapos free pa.. Thank u agad...
Nasa link na sa 1st post yung app and models, at free talaga yan dahil open source for offline use. sinabi ko na rin yung ibang apps para dyan and other related apps to use na free din. Follow the guides na lang sa links.
Basahin nyo yung github link para makagamit ng demo nya or run it yourself. May links sa bawat app dyan. The app will run in pure python (as stated sa Language sa may right-side bottom corner ng link) kung gusto nyong subukan locally or sa ibang server site.
Yung link contains 3 apps: Vibevoice-ASR for STT, Vibevoice-TTS standard, at Vibevoice-Realtime-TTS. Yung una lang ang may windows release. the rest, kayo na mag-setup. Sa github kasi, for devs yan kaya swertihan kung mag-release ng readymade app. Normally, you have to know the programming language they are using to run it on your own. Yan kasi ang norm dyan. Check mo yung posts ko para maksubok ka online ng vibevoice or hanapin mo yung huggingface or colab links dyan sa binigay ni TS.
 

About this Thread

  • 33
    Replies
  • 1K
    Views
  • 9
    Participants
Last reply from:
alist1986

Online now

Members online
1,207
Guests online
1,167
Total visitors
2,374

Forum statistics

Threads
2,274,523
Posts
28,956,509
Members
1,234,263
Latest member
Momochang
Back
Top