Free TTS, STT and other AI audio related trends - Running Thread

Try nyo ito with emotions (samples) :
Spoiler contents are visible only to Established Members.
Balikan nyo na lang yung mga dating free TTS online at uso na yang "emotions" sa TTS at voice generation apps. Same yan sa uso rin na "Reasoning" GPTs at LLMs sa language processing. (Expect quick updates dahil open na sa public yung techniques done ny the Deepseek team from the success of Deepseek-R1. Though, wala pa silang TTS/STT, there are a lot of opensource apis to link to them on modern AI interfaces.)

Sa manual manipulation/setting lang kalimitan pag TTS only, pero merong premium AI (local/online) na automatic na mag-react/respond ayon sa tema ng mga pangungusap. Yung GPT-4o api pataas (o1 & o3) ay may builtin na nyan sa kanilang multimodal api features, kasama yung speech emotional recognition for ASR with STT. Check nyo na lang yung ibang latest LLM model cards kung meron sila.
 
For slightly advanced projects, check nyo yung mga 20 projects ni You do not have permission to view the full content of this post. Log in or register now..
Dagdag ko na rin itong old and reliable TTS ng SUNO AI na You do not have permission to view the full content of this post. Log in or register now.. Lagyan nyo lang adlibs, pwede rin yan tumawa at kumanta he he.
 
Sa ngayon, yung best choice sa offline TTS for automatic emotions is Kokor-TTS. Yan yung i-project ninyo't active naman sa github yan. Supported din yan sa You do not have permission to view the full content of this post. Log in or register now. as stated sa You do not have permission to view the full content of this post. Log in or register now. nila. good for cpu mode.

Pero sa high-end use, search nyo yung ByteDance Speech. Kahit sa lipsync ay perfect!
You do not have permission to view the full content of this post. Log in or register now.
 
thinkpad t480
Tama ka, and I stand corrected. Yang Kokoro-TTS ay magkakaroon lang ng emotions and E×ρréššions using prosodies. Mamimili ka ng model na trained sa specific emotion persona. By default, parang enhanced version ng Edge TTS with miniimal resources. Lightweight kasi. Pero ok sa'king +12 yrs old na i7 laptop...Yung sa baba matagal ang processing

For TTS with emotions, yung orpheus tts ang baka umubra sa setup mo, be sure may enough ram ka at malaki models nyan. Ito: You do not have permission to view the full content of this post. Log in or register now. .Yung medyo mabilis for cpu is this: You do not have permission to view the full content of this post. Log in or register now.

Hanap ka na lang sa list kung may cpu mode sila: You do not have permission to view the full content of this post. Log in or register now.

Ang modes naman for local AI sa cpu ay optimize using quantized models, ONNX runtime, Openvino or using cpp libraries just to run them and test...GPU is recommended for faster rendering.

Online Apis ang gamit ko sa TTS na converted sa Sapi5, para hindi hirap pc ko he he.
 
Doon sa naghahanap ng offline TTS with acceptable emotions and E×ρréššions (low-high, controllable) subukan nyo yung mga ito.
1. XTTS2 - successor ng Coqui
Recommended Repositories
  • You do not have permission to view the full content of this post. Log in or register now.: A clean, text-based voice cloning UI. It supports 16 languages and allows voice recording/uploading directly in the interface.
  • You do not have permission to view the full content of this post. Log in or register now.: A comprehensive local web interface with easy installation scripts for both Windows and Linux.
  • You do not have permission to view the full content of this post. Log in or register now.: A FastAPI-based server designed for integrating XTTSv2 into other applications like SillyTavern.
  • You do not have permission to view the full content of this post. Log in or register now.: Combines XTTSv2 with RVC (Retrieval-based Voice Conversion) to improve output quality beyond standard XTTS cloning.
  • You do not have permission to view the full content of this post. Log in or register now.: A local application specifically designed for creating audiobooks using XTTSv2.
2. Fish-audio - with high quality E×ρréššion and emotions
( sa online website nila,
You do not have permission to view the full content of this post. Log in or register now.. The models maybe lurking somewhere sa huggingface hub he he.)

Primary Repositories
  • You do not have permission to view the full content of this post. Log in or register now.: The flagship repository for their state-of-the-art (SOTA) open-source TTS. It features the Fish Speech V1.5 and the newly announced S2 model, which supports fine-grained emotional control (e.g., [laugh, whispers]) and zero-shot voice cloning with as little as 10 seconds of audio.
  • You do not have permission to view the full content of this post. Log in or register now.: A framework for TTS, Singing Voice Synthesis (SVS), and Singing Voice Conversion (SVC) based on diffusion models. It is designed to be simpler and more modular than original diffusion-based SVC repositories.
  • You do not have permission to view the full content of this post. Log in or register now.: A utility toolkit for preparing audio datasets. It includes scripts for vocal separation, automatic slicing, loudness matching, and transcription via WhisperX or FunASR.
  • You do not have permission to view the full content of this post. Log in or register now.: A repository focused on the VITS2 backbone integrated with multilingual BERT for improved prosody and naturalness.
3. IndexTTS2 - top/industrial grade offline TTS
Working GitHub Repositories
  • index-tts/index-tts: The primary repository for the model, which provides an "industrial-level" zero-shot TTS system. It includes a Gradio-based web interface and can be set up using the uv package manager.
  • You do not have permission to view the full content of this post. Log in or register now.: A key source for the IndexTTS2 codebase and pre-trained weights, focusing on the model's novel duration adaptation scheme and emotion-speaker decoupling.
  • You do not have permission to view the full content of this post. Log in or register now.: An enhanced fork that adds specialized features like batch inference and more granular control over the speed of speech while maintaining compatibility with the original project.
  • You do not have permission to view the full content of this post. Log in or register now.: A lightweight wrapper for users who want to integrate IndexTTS2 voice cloning and emotion control directly into their ComfyUI workflows.
  • You do not have permission to view the full content of this post. Log in or register now.: A wrapper designed for deploying the model on Replicate using Cog, allowing for zero-shot speaker cloning via API.
4. Chatterbox - text-to-speech models by Resemble AI

Core & Official Repositories
  • resemble-ai/chatterbox: The official primary repository. It houses the original model (0.5B parameters), the Multilingual version (23 languages), and the latest Chatterbox-Turbo, which uses a streamlined 350M architecture for ultra-low latency.
  • resemble-ai/chatterbox_demopage: The official GitHub Pages site for listening to high-fidelity audio samples.

Self-Hosting & API Implementations
  • You do not have permission to view the full content of this post. Log in or register now.: A feature-rich self-hosting solution. It includes a modern Web UI, support for batch processing (audiobooks), and hot-swappable engines for the Original, Multilingual, and Turbo models.
  • You do not have permission to view the full content of this post. Log in or register now.: A FastAPI-powered REST API that provides OpenAI-compatible endpoints. It is designed for easy integration into existing applications as a drop-in replacement for OpenAI’s TTS API.
  • You do not have permission to view the full content of this post. Log in or register now.: An alternative implementation that uses the uv package manager for fast installation and includes both a FastAPI server and a Streamlit UI.

Optimization & Specialized Tools
  • You do not have permission to view the full content of this post. Log in or register now.: A port of the model to the vLLM framework, achieving up to 10x speedup with batching and improved GPU memory efficiency.
  • You do not have permission to view the full content of this post. Log in or register now.: A fork specifically optimized for real-time streaming, reaching a latency to the first audio chunk of under 0.5 seconds on an RTX 4090.
  • You do not have permission to view the full content of this post. Log in or register now.: Custom nodes for integrating Chatterbox TTS and voice conversion directly into ComfyUI workflows.
  • You do not have permission to view the full content of this post. Log in or register now.: An implementation for the Model Context Protocol (MCP), allowing AI agents to generate and play speech automatically through a unified tool.
5. Kitten TTS - minimalistic TTS with limited emotional TTS which runs fast on CPU mode
Core GitHub Repositories
  • You do not have permission to view the full content of this post. Log in or register now.: The official repository containing the core model code. It features multiple model tiers, including "Nano" (15M parameters), "Micro" (40M), and "Mini" (80M).
  • You do not have permission to view the full content of this post. Log in or register now.: A production-ready self-hosting server that adds a FastAPI backend, an intuitive Web UI, and GPU acceleration (not in the original version). It is ideal for generating audiobooks by splitting large texts.
  • You do not have permission to view the full content of this post. Log in or register now.: A repository focusing on developer integration and easy deployment for mobile or IoT devices. It includes prebuilt binaries and example voices for quick starts.
  • You do not have permission to view the full content of this post. Log in or register now.: A simple web-based demo project to test the model's capabilities in a browser environment.
  • You do not have permission to view the full content of this post. Log in or register now.: A specialized implementation for storytelling, utilizing KittenTTS for its natural prosody and low resource requirements
BONUS:
Sa minimalistic users na walang GPU pero may decent RAM sa pc, this will do:
TTS-Studio: You do not have permission to view the full content of this post. Log in or register now.

A unified web-based interface for multiple text-to-speech models - featuring Kitten TTS, Piper TTS, and Kokoro TTS running entirely in your browser! Switch between models seamlessly and choose the perfect voice for your needs.
Kung gusto nyo ng AIO na kaya yang lahat mg models, use You do not have permission to view the full content of this post. Log in or register now.. Models are here: You do not have permission to view the full content of this post. Log in or register now. . Supported nya yung You do not have permission to view the full content of this post. Log in or register now. pero you need the onnx model. You can covert it with their tiik or just go here: You do not have permission to view the full content of this post. Log in or register now.. Hanapin nyo yung "You do not have permission to view the full content of this post. Log in or register now." model for tagalog in onnx form. 'Eto na para madali: You do not have permission to view the full content of this post. Log in or register now.
===================================================================================
Note:
  • The first 4 needs a decent GPU _ enough system RAM to run properly - see their requirements.
  • You can try voice cloning in Tagalog in IndexTTS2 (magaling yan manggaya but not linguistically perfect). You can try it in XTTS2 as well.

Nakita ko kasi, ang daming nagbebenta ng ganitong free apps with some cosmetic UI - not bad! Pero better know how to use them if you have the hardware.
===================================================================================
Good Luck.



1773218294833.webp

1773220427069.webp

1773220427091.webp
 

About this Thread

  • 28
    Replies
  • 3K
    Views
  • 7
    Participants
Last reply from:
alist1986

Online now

Members online
1,087
Guests online
1,201
Total visitors
2,288

Forum statistics

Threads
2,272,006
Posts
28,939,561
Members
1,237,947
Latest member
traxy
Back
Top