Doon sa naghahanap ng offline TTS with acceptable emotions and E×ρréššions (low-high, controllable) subukan nyo yung mga ito.
1.
XTTS2 - successor ng Coqui
Recommended Repositories
- You do not have permission to view the full content of this post.
Log in or register now.: A clean, text-based voice cloning UI. It supports 16 languages and allows voice recording/uploading directly in the interface.
- You do not have permission to view the full content of this post.
Log in or register now.: A comprehensive local web interface with easy installation scripts for both Windows and Linux.
- You do not have permission to view the full content of this post.
Log in or register now.: A FastAPI-based server designed for integrating XTTSv2 into other applications like SillyTavern.
- You do not have permission to view the full content of this post.
Log in or register now.: Combines XTTSv2 with RVC (Retrieval-based Voice Conversion) to improve output quality beyond standard XTTS cloning.
- You do not have permission to view the full content of this post.
Log in or register now.: A local application specifically designed for creating audiobooks using XTTSv2.
2.
Fish-audio - with high quality E×ρréššion and emotions
( sa online website nila, You do not have permission to view the full content of this post.
Log in or register now.. The models maybe lurking somewhere sa huggingface hub he he.)
Primary Repositories
- You do not have permission to view the full content of this post.
Log in or register now.: The flagship repository for their state-of-the-art (SOTA) open-source TTS. It features the Fish Speech V1.5 and the newly announced S2 model, which supports fine-grained emotional control (e.g., [laugh, whispers]) and zero-shot voice cloning with as little as 10 seconds of audio.
- You do not have permission to view the full content of this post.
Log in or register now.: A framework for TTS, Singing Voice Synthesis (SVS), and Singing Voice Conversion (SVC) based on diffusion models. It is designed to be simpler and more modular than original diffusion-based SVC repositories.
- You do not have permission to view the full content of this post.
Log in or register now.: A utility toolkit for preparing audio datasets. It includes scripts for vocal separation, automatic slicing, loudness matching, and transcription via WhisperX or FunASR.
- You do not have permission to view the full content of this post.
Log in or register now.: A repository focused on the VITS2 backbone integrated with multilingual BERT for improved prosody and naturalness.
3.
IndexTTS2 - top/industrial grade offline TTS
Working GitHub Repositories
- index-tts/index-tts: The primary repository for the model, which provides an "industrial-level" zero-shot TTS system. It includes a Gradio-based web interface and can be set up using the uv package manager.
- You do not have permission to view the full content of this post.
Log in or register now.: A key source for the IndexTTS2 codebase and pre-trained weights, focusing on the model's novel duration adaptation scheme and emotion-speaker decoupling.
- You do not have permission to view the full content of this post.
Log in or register now.: An enhanced fork that adds specialized features like batch inference and more granular control over the speed of speech while maintaining compatibility with the original project.
- You do not have permission to view the full content of this post.
Log in or register now.: A lightweight wrapper for users who want to integrate IndexTTS2 voice cloning and emotion control directly into their ComfyUI workflows.
- You do not have permission to view the full content of this post.
Log in or register now.: A wrapper designed for deploying the model on Replicate using Cog, allowing for zero-shot speaker cloning via API.
4.
Chatterbox - text-to-speech models by Resemble AI
Core & Official Repositories
- resemble-ai/chatterbox: The official primary repository. It houses the original model (0.5B parameters), the Multilingual version (23 languages), and the latest Chatterbox-Turbo, which uses a streamlined 350M architecture for ultra-low latency.
- resemble-ai/chatterbox_demopage: The official GitHub Pages site for listening to high-fidelity audio samples.
Self-Hosting & API Implementations
- You do not have permission to view the full content of this post.
Log in or register now.: A feature-rich self-hosting solution. It includes a modern Web UI, support for batch processing (audiobooks), and hot-swappable engines for the Original, Multilingual, and Turbo models.
- You do not have permission to view the full content of this post.
Log in or register now.: A FastAPI-powered REST API that provides OpenAI-compatible endpoints. It is designed for easy integration into existing applications as a drop-in replacement for OpenAI’s TTS API.
- You do not have permission to view the full content of this post.
Log in or register now.: An alternative implementation that uses the uv package manager for fast installation and includes both a FastAPI server and a Streamlit UI.
Optimization & Specialized Tools
- You do not have permission to view the full content of this post.
Log in or register now.: A port of the model to the vLLM framework, achieving up to 10x speedup with batching and improved GPU memory efficiency.
- You do not have permission to view the full content of this post.
Log in or register now.: A fork specifically optimized for real-time streaming, reaching a latency to the first audio chunk of under 0.5 seconds on an RTX 4090.
- You do not have permission to view the full content of this post.
Log in or register now.: Custom nodes for integrating Chatterbox TTS and voice conversion directly into ComfyUI workflows.
- You do not have permission to view the full content of this post.
Log in or register now.: An implementation for the Model Context Protocol (MCP), allowing AI agents to generate and play speech automatically through a unified tool.
5.
Kitten TTS - minimalistic TTS with limited emotional TTS which runs fast on CPU mode
Core GitHub Repositories
- You do not have permission to view the full content of this post.
Log in or register now.: The official repository containing the core model code. It features multiple model tiers, including "Nano" (15M parameters), "Micro" (40M), and "Mini" (80M).
- You do not have permission to view the full content of this post.
Log in or register now.: A production-ready self-hosting server that adds a FastAPI backend, an intuitive Web UI, and GPU acceleration (not in the original version). It is ideal for generating audiobooks by splitting large texts.
- You do not have permission to view the full content of this post.
Log in or register now.: A repository focusing on developer integration and easy deployment for mobile or IoT devices. It includes prebuilt binaries and example voices for quick starts.
- You do not have permission to view the full content of this post.
Log in or register now.: A simple web-based demo project to test the model's capabilities in a browser environment.
- You do not have permission to view the full content of this post.
Log in or register now.: A specialized implementation for storytelling, utilizing KittenTTS for its natural prosody and low resource requirements
BONUS:
Sa minimalistic users na walang GPU pero may decent RAM sa pc, this will do:
TTS-Studio: You do not have permission to view the full content of this post.
Log in or register now.
A unified web-based interface for
multiple text-to-speech models - featuring
Kitten TTS,
Piper TTS, and
Kokoro TTS running entirely in your browser! Switch between models seamlessly and choose the perfect voice for your needs.
Kung gusto nyo ng AIO na kaya yang lahat mg models, use
You do not have permission to view the full content of this post.
Log in or register now.. Models are here:
You do not have permission to view the full content of this post.
Log in or register now. . Supported nya yung
You do not have permission to view the full content of this post.
Log in or register now. pero you need the onnx model. You can covert it with their tiik or just go here:
You do not have permission to view the full content of this post.
Log in or register now.. Hanapin nyo yung "
You do not have permission to view the full content of this post.
Log in or register now." model for tagalog in onnx form. 'Eto na para madali:
You do not have permission to view the full content of this post.
Log in or register now.
===================================================================================
Note:
- The first 4 needs a decent GPU _ enough system RAM to run properly - see their requirements.
- You can try voice cloning in Tagalog in IndexTTS2 (magaling yan manggaya but not linguistically perfect). You can try it in XTTS2 as well.
Nakita ko kasi, ang daming nagbebenta ng ganitong free apps with some cosmetic UI - not bad! Pero better know how to use them if you have the hardware.
===================================================================================
Good Luck.
