Zhipu AI Open-Sources GLM-TTS: Multi-Reward RL-Powered TTS That Clones Voices in 3 Seconds with Emotional Depth
Zhipu AI released and open-sourced GLM-TTS on December 11, 2025 — an industrial-grade text-to-speech system that clones any voice from just 3 seconds of audio, delivering natural prosody, emotional expressiveness, and precise pronunciation. Powered by a two-stage architecture and multi-reward reinforcement learning (GRPO framework), it hits open-source SOTA on character error rate (0.89%) and emotional fidelity using only 100K hours of training data. Weights are now available on GitHub and Hugging Face, with seamless integration into Zhipu's ecosystem for audiobooks, assistants, and dubbing.
MiniMax's VoxCPM 1.5 Goes Open-Source: Voice Generation Gets a Massive Upgrade — Natural, Emotional, and Fully Controllable
On December 12, 2025, MiniMax (FaceWall Intelligence) open-sourced VoxCPM 1.5 — its next-gen text-to-speech model that leaps forward in naturalness, emotional depth, and fine-grained control. Supporting multilingual synthesis, prosody adjustment, and zero-shot voice cloning, it outperforms ElevenLabs and XTTS v2 on blind tests while staying fully open-weights. Now live on GitHub and Hugging Face, early adopters are already deploying it for audiobooks, dubbing, and real-time voice agents.


