Free TTS, STT and other AI audio related trends - Running Thread

alist1986 · Feb 8, 2025

Try nyo ito with emotions (samples) :

Spoiler contents are visible only to Established Members.

Balikan nyo na lang yung mga dating free TTS online at uso na yang "emotions" sa TTS at voice generation apps. Same yan sa uso rin na "Reasoning" GPTs at LLMs sa language processing. (Expect quick updates dahil open na sa public yung techniques done ny the Deepseek team from the success of Deepseek-R1. Though, wala pa silang TTS/STT, there are a lot of opensource apis to link to them on modern AI interfaces.)

Sa manual manipulation/setting lang kalimitan pag TTS only, pero merong premium AI (local/online) na automatic na mag-react/respond ayon sa tema ng mga pangungusap. Yung GPT-4o api pataas (o1 & o3) ay may builtin na nyan sa kanilang multimodal api features, kasama yung speech emotional recognition for ASR with STT. Check nyo na lang yung ibang latest LLM model cards kung meron sila.

Penduko888 · Feb 9, 2025

pasilip po

cenahum · Feb 24, 2025

Hi guys
good information here in this post
Thanks

alist1986 · Feb 24, 2025

For slightly advanced projects, check nyo yung mga 20 projects ni You do not have permission to view the full content of this post. Log in or register now..
Dagdag ko na rin itong old and reliable TTS ng SUNO AI na You do not have permission to view the full content of this post. Log in or register now.. Lagyan nyo lang adlibs, pwede rin yan tumawa at kumanta he he.

alist1986 · Mar 17, 2025

Sa ngayon, yung best choice sa offline TTS for automatic emotions is Kokor-TTS. Yan yung i-project ninyo't active naman sa github yan. Supported din yan sa You do not have permission to view the full content of this post. Log in or register now. as stated sa You do not have permission to view the full content of this post. Log in or register now. nila. good for cpu mode.

Pero sa high-end use, search nyo yung ByteDance Speech. Kahit sa lipsync ay perfect!
You do not have permission to view the full content of this post. Log in or register now.

dhevy_75 · Apr 8, 2025

baka meron nito pag laptop diko ma download installer sana mga loads

banong_gang · Feb 28, 2026

@alist1986 lods, anong swak sa thinkpad t480, na offline/CPU based TTS, yung kokoro kasi parang walang emotion ang nagegenerate nya,

alist1986 · Mar 1, 2026

banong_gang said:
thinkpad t480

Tama ka, and I stand corrected. Yang Kokoro-TTS ay magkakaroon lang ng emotions and E×ρréššions using prosodies. Mamimili ka ng model na trained sa specific emotion persona. By default, parang enhanced version ng Edge TTS with miniimal resources. Lightweight kasi. Pero ok sa'king +12 yrs old na i7 laptop...Yung sa baba matagal ang processing

For TTS with emotions, yung orpheus tts ang baka umubra sa setup mo, be sure may enough ram ka at malaki models nyan. Ito: You do not have permission to view the full content of this post. Log in or register now. .Yung medyo mabilis for cpu is this: You do not have permission to view the full content of this post. Log in or register now.

Hanap ka na lang sa list kung may cpu mode sila: You do not have permission to view the full content of this post. Log in or register now.

Ang modes naman for local AI sa cpu ay optimize using quantized models, ONNX runtime, Openvino or using cpp libraries just to run them and test...GPU is recommended for faster rendering.

Online Apis ang gamit ko sa TTS na converted sa Sapi5, para hindi hirap pc ko he he.

alist1986 · Mar 11, 2026

Doon sa naghahanap ng offline TTS with acceptable emotions and E×ρréššions (low-high, controllable) subukan nyo yung mga ito.
1. XTTS2 - successor ng Coqui
Recommended Repositories

You do not have permission to view the full content of this post. Log in or register now.: A clean, text-based voice cloning UI. It supports 16 languages and allows voice recording/uploading directly in the interface.
You do not have permission to view the full content of this post. Log in or register now.: A comprehensive local web interface with easy installation scripts for both Windows and Linux.
You do not have permission to view the full content of this post. Log in or register now.: A FastAPI-based server designed for integrating XTTSv2 into other applications like SillyTavern.
You do not have permission to view the full content of this post. Log in or register now.: Combines XTTSv2 with RVC (Retrieval-based Voice Conversion) to improve output quality beyond standard XTTS cloning.
You do not have permission to view the full content of this post. Log in or register now.: A local application specifically designed for creating audiobooks using XTTSv2.

2. Fish-audio - with high quality E×ρréššion and emotions
( sa online website nila, You do not have permission to view the full content of this post. Log in or register now.. The models maybe lurking somewhere sa huggingface hub he he.)

Primary Repositories

You do not have permission to view the full content of this post. Log in or register now.: The flagship repository for their state-of-the-art (SOTA) open-source TTS. It features the Fish Speech V1.5 and the newly announced S2 model, which supports fine-grained emotional control (e.g., [laugh, whispers]) and zero-shot voice cloning with as little as 10 seconds of audio.
You do not have permission to view the full content of this post. Log in or register now.: A framework for TTS, Singing Voice Synthesis (SVS), and Singing Voice Conversion (SVC) based on diffusion models. It is designed to be simpler and more modular than original diffusion-based SVC repositories.
You do not have permission to view the full content of this post. Log in or register now.: A utility toolkit for preparing audio datasets. It includes scripts for vocal separation, automatic slicing, loudness matching, and transcription via WhisperX or FunASR.
You do not have permission to view the full content of this post. Log in or register now.: A repository focused on the VITS2 backbone integrated with multilingual BERT for improved prosody and naturalness.

3. IndexTTS2 - top/industrial grade offline TTS
Working GitHub Repositories

index-tts/index-tts: The primary repository for the model, which provides an "industrial-level" zero-shot TTS system. It includes a Gradio-based web interface and can be set up using the uv package manager.
You do not have permission to view the full content of this post. Log in or register now.: A key source for the IndexTTS2 codebase and pre-trained weights, focusing on the model's novel duration adaptation scheme and emotion-speaker decoupling.
You do not have permission to view the full content of this post. Log in or register now.: An enhanced fork that adds specialized features like batch inference and more granular control over the speed of speech while maintaining compatibility with the original project.
You do not have permission to view the full content of this post. Log in or register now.: A lightweight wrapper for users who want to integrate IndexTTS2 voice cloning and emotion control directly into their ComfyUI workflows.
You do not have permission to view the full content of this post. Log in or register now.: A wrapper designed for deploying the model on Replicate using Cog, allowing for zero-shot speaker cloning via API.

4. Chatterbox - text-to-speech models by Resemble AI

Core & Official Repositories

resemble-ai/chatterbox: The official primary repository. It houses the original model (0.5B parameters), the Multilingual version (23 languages), and the latest Chatterbox-Turbo, which uses a streamlined 350M architecture for ultra-low latency.
resemble-ai/chatterbox_demopage: The official GitHub Pages site for listening to high-fidelity audio samples.

Self-Hosting & API Implementations

You do not have permission to view the full content of this post. Log in or register now.: A feature-rich self-hosting solution. It includes a modern Web UI, support for batch processing (audiobooks), and hot-swappable engines for the Original, Multilingual, and Turbo models.
You do not have permission to view the full content of this post. Log in or register now.: A FastAPI-powered REST API that provides OpenAI-compatible endpoints. It is designed for easy integration into existing applications as a drop-in replacement for OpenAI’s TTS API.
You do not have permission to view the full content of this post. Log in or register now.: An alternative implementation that uses the uv package manager for fast installation and includes both a FastAPI server and a Streamlit UI.

Optimization & Specialized Tools

You do not have permission to view the full content of this post. Log in or register now.: A port of the model to the vLLM framework, achieving up to 10x speedup with batching and improved GPU memory efficiency.
You do not have permission to view the full content of this post. Log in or register now.: A fork specifically optimized for real-time streaming, reaching a latency to the first audio chunk of under 0.5 seconds on an RTX 4090.
You do not have permission to view the full content of this post. Log in or register now.: Custom nodes for integrating Chatterbox TTS and voice conversion directly into ComfyUI workflows.
You do not have permission to view the full content of this post. Log in or register now.: An implementation for the Model Context Protocol (MCP), allowing AI agents to generate and play speech automatically through a unified tool.

5. Kitten TTS - minimalistic TTS with limited emotional TTS which runs fast on CPU mode
Core GitHub Repositories

You do not have permission to view the full content of this post. Log in or register now.: The official repository containing the core model code. It features multiple model tiers, including "Nano" (15M parameters), "Micro" (40M), and "Mini" (80M).
You do not have permission to view the full content of this post. Log in or register now.: A production-ready self-hosting server that adds a FastAPI backend, an intuitive Web UI, and GPU acceleration (not in the original version). It is ideal for generating audiobooks by splitting large texts.
You do not have permission to view the full content of this post. Log in or register now.: A repository focusing on developer integration and easy deployment for mobile or IoT devices. It includes prebuilt binaries and example voices for quick starts.
You do not have permission to view the full content of this post. Log in or register now.: A simple web-based demo project to test the model's capabilities in a browser environment.
You do not have permission to view the full content of this post. Log in or register now.: A specialized implementation for storytelling, utilizing KittenTTS for its natural prosody and low resource requirements

BONUS:
Sa minimalistic users na walang GPU pero may decent RAM sa pc, this will do:
TTS-Studio: You do not have permission to view the full content of this post. Log in or register now.
A unified web-based interface for multiple text-to-speech models - featuring Kitten TTS, Piper TTS, and Kokoro TTS running entirely in your browser! Switch between models seamlessly and choose the perfect voice for your needs.
Kung gusto nyo ng AIO na kaya yang lahat mg models, use You do not have permission to view the full content of this post. Log in or register now.. Models are here: You do not have permission to view the full content of this post. Log in or register now. . Supported nya yung You do not have permission to view the full content of this post. Log in or register now. pero you need the onnx model. You can covert it with their tiik or just go here: You do not have permission to view the full content of this post. Log in or register now.. Hanapin nyo yung "You do not have permission to view the full content of this post. Log in or register now." model for tagalog in onnx form. 'Eto na para madali: You do not have permission to view the full content of this post. Log in or register now.
===================================================================================
Note:

The first 4 needs a decent GPU _ enough system RAM to run properly - see their requirements.
You can try voice cloning in Tagalog in IndexTTS2 (magaling yan manggaya but not linguistically perfect). You can try it in XTTS2 as well.

Nakita ko kasi, ang daming nagbebenta ng ganitong free apps with some cosmetic UI - not bad! Pero better know how to use them if you have the hardware.
===================================================================================
Good Luck.

Search

Search

Free TTS, STT and other AI audio related trends - Running Thread

alist1986

Forum Guru

Penduko888

Enthusiast

cenahum

Leecher

alist1986

Forum Guru

alist1986

Forum Guru

dhevy_75

Leecher

banong_gang

@alist1986 lods, anong swak sa thinkpad t480, na offline/CPU based TTS, yung kokoro kasi parang walang emotion ang nagegenerate nya,

alist1986

Forum Guru

alist1986

Forum Guru

About this Thread

New Topics

GPT häçks another AI

100% FREE AI That Turns Text Into Real CAD Models

100% FREE Local OCR AI That Runs on Your PC

Google's Gemini 3.6 Flash, 3.5 Flash-Lite & 3.5 Flash Cyber: A Leaner, Agent-Ready Model Lineup

100% FREE Open-Source AI Agent Framework That Runs on Your PC

LF Talkpal po kahit trial lang

FREE $4,000 API Credits - GLM 5.2, DeepSeek, KIMI Free - No Card Needed, Register lang!

ExtremeRouter: Upgraded Version of 9Router

Best ai as of now

FREE $150 API CREDIT - gpt-5.5, claude-opus-4-6, 7, and 8, and glm-5.2

Trending Topics

Online now

Forum statistics

Free TTS, STT and other AI audio related trends - Running Thread

Forum Guru

Enthusiast

Leecher

Forum Guru

Forum Guru

Leecher

@alist1986 lods, anong swak sa thinkpad t480, na offline/CPU based TTS, yung kokoro kasi parang walang emotion ang nagegenerate nya,​

Forum Guru

Forum Guru

About this Thread

Trending Topics

Online now

Forum statistics

@alist1986 lods, anong swak sa thinkpad t480, na offline/CPU based TTS, yung kokoro kasi parang walang emotion ang nagegenerate nya,