❓ Help HELP Text to Speech tagalog and english

Be
OmniVoice yung may external service na may magandang tagalog din at may E×ρréššions na automatically. Sa You do not have permission to view the full content of this post. Log in or register now. nila ako nag-test a few months back na may zero shot cloning. Kahit may accent kuha nya (manilenyo, taga-cavite, bisaya, ilokano, ilonggo...) basta may 10-sec clip ka. Wala lang dyan option sa ttsomni. It supports 600-languages. Abangan mo na lang yung mga updates sa mga TTS dahil ang dev-trend is adding a voice clip cloning feature to create a multi-language replica. Parehas sa llm na dinadagdagan ng lora for customized/specific use para iwas na yung user to train it.
Ang regular na gamit ko for reading aloud is Supertonic TTS na ka-partner ng Gemini Nano (LLM) for light tasks - all local AI. Gemma4-2b-Q4 pag may tool-calling for simple automation. Kahit 3rd-gen pc kaya yan.
Yang OmniVoice with voice cloning, pwede sa cpu mode pero di ko pa nasubukan. Sa voice cloning, skip mo lang yung ASR (via Whisper) para bawas sa trabaho ng cpu. Ikaw na yung mag-transcribe ng audio. Yung vibevoice ayos din sa cpu mode. Kailangan, subukan nyo muna yung local AI kaysa mag-rely sa online services na very limited.

Test mo dito: You do not have permission to view the full content of this post. Log in or register now.
Hanap ka ng omnivoice o clone nya like Ming-omni-tts sa huggingface spaces to test
Before my target audience mga Filipinos okay naman, sa kanila Ang Gemini Audio, now American Audience, laging comment , irritated Sila sa voice ng Gemini ai, competitor ko Kasi eleven labs ata ung voice na gamit,
 
Be

Before my target audience mga Filipinos okay naman, sa kanila Ang Gemini Audio, now American Audience, laging comment , irritated Sila sa voice ng Gemini ai, competitor ko Kasi eleven labs ata ung voice na gamit,
Ganito. Pag maganda naman yung speech/intonation and grammar, palitan mo yung voice via cloning para di halata kung anong AI yung gamit. Pag default voices, halata yon.
Ang pinakamadali sa akin sa Tagalog o kahit pa English, ako yung magsasalita para dama yung essence ng E×ρréššions and emotions, tapos palitan ko lang yung voice, gender and age. Di mo na kailangan ng TTS lung talagang hanap mo yan sa iyong project.
Sa Tagalog, kahit elevenlabs di pa rin complete yung appreciation sa pagbigkas. Sikat lang yan dahil may variety yung pagpili ng premium voices at languages. Marami rin mas maganda pa dyan pero sobrang mahal ng service.
Sa omnivoice na diffusion language model-style TTS , kailangan mong i-enhance pa yung audio ng kaunti lalo pa sa free generation, para pantay na sa 11labs yan. Maliit na model kasi sa free service - yung 0.5GB model. Gamitan mo ng non-verbal metatags, yung tinatawag nilang design and attributes instructions at punctuation & pronunciation overrides. Check mo sa docs, nandoon yon. Kaya pwede mong kontrolin yung boses (malakas/ pabulong), gender, age, mag-pause, etc. in an instant kahit sa isang sentence. Sa iba, wala nyan! Di ko alam sa small model kung fully supported yan, pero sa large model oo.
 
Before my target audience mga Filipinos okay naman, sa kanila Ang Gemini Audio, now American Audience, laging comment , irritated Sila sa voice ng Gemini ai, competitor ko Kasi eleven labs ata ung voice na gamit,
Na-test mo ba yung kokoro? Meron siilang models na kaboses ng openai tts-1.
You do not have permission to view the full content of this post. Log in or register now.

Ito yung fastest way to use it:
click this link: You do not have permission to view the full content of this post. Log in or register now.
Kung wala kang gpu, click cpu.
Type your text, then click generate voice.
Ayos naman yung audios.
Iyo yung cimparison.
If you ask for OpenAI Voice: [You do not have permission to view the full content of this post. Log in or register now., You do not have permission to view the full content of this post. Log in or register now.]Kokoro Web maps it to Native Voice ID:Voice Characteristic
🎙️ alloyaf_heartAmerican Female (Default / Balanced)
🎙️ novaaf_bellaAmerican Female (Bright & Energetic)
🎙️ shimmeraf_skyAmerican Female (Soft & Clear)
🎙️ onyxam_adamAmerican Male (Deep & Authoritative)
🎙️ echoam_michaelAmerican Male (Crisp & Resonant)
🎙️ fablebf_emma or bm_georgeBritish Accent Alternative

Note: Ito yung hint sa version na nandyan sa test ko
-----------------------------------------------------------

How to Control the Audio Using Just Plain Text
Since you cannot use tags, you can guide the AI engine using these text-only patterns: [You do not have permission to view the full content of this post. Log in or register now.]
  • To create short pauses: Use standard commas (,) or semicolons (;).
  • To create long, dramatic pauses: Add an ellipsis (...) or put an isolated period on its own line (...).
  • To change voice cadence / prevent flattening: Insert em-dashes (—) before and after action tags (like — the farmer says. —). This causes the engine to shift its tone naturally.
  • To change overall speed: Use the native "Speed" slider directly on the web interface page instead of typing text rules.
Di siya compatible sa SSML prosody tags.

Pili ka ng Kokoro model na ok sa 'yo ang speed. Yung Q4/4-bit ang bagay sa low-RAM CPU. yung 154mb gamit ko pero matagal pag 1-min yung audio kahit mababa yung cpu usage. Mga 6 -7 minutes sa 1-minute audio sa aking 3rd gen laptop.

Ito na ang gamit ko dahil may emotions na, mabilis pa sa Sapi5: https://phcorner.org/threads/2530259/
Instant halos yung TTS conversion at ok din yung audio for listening to text news.
 

About this Thread

  • 42
    Replies
  • 1K
    Views
  • 6
    Participants
Last reply from:
alist1986

Online now

Members online
1,024
Guests online
1,442
Total visitors
2,466

Forum statistics

Threads
2,278,184
Posts
28,981,430
Members
1,228,257
Latest member
maxxa
Back
Top