Zhipu AI Wraps Multimodal Open Source Week: Four Core Video Generation Technologies Fully Open-Sourced — Paving the Way for Next-Gen AI Filmmaking

On December 13, 2025, Zhipu AI concluded its "Multimodal Open Source Week" with a bang — open-sourcing four pivotal technologies powering advanced video generation: GLM-4.6V for visual understanding, AutoGLM for intelligent device control, GLM-ASR for high-fidelity speech recognition, and GLM-TTS for expressive speech synthesis. These modules, now freely available on GitHub and Hugging Face, enable end-to-end multimodal pipelines that fuse perception, reasoning, audio, and action — slashing barriers for developers building interactive video agents, embodied AI, and cinematic tools.

Zhipu AI Launches and Open-Sources GLM-4.6V Series: Native Multimodal Tool Calling Turns Vision into Action — The True Agentic VLM Revolution

On December 8, 2025, Zhipu AI officially released and fully open-sourced the GLM-4.6V series multimodal models, including the high-performance GLM-4.6V (106B total params, 12B active) and the lightweight GLM-4.6V-Flash (9B). Featuring groundbreaking native multimodal function calling — where images serve directly as parameters and results as context — plus a 128K token window for handling 150-page docs or hour-long videos, it achieves SOTA on 30+ benchmarks at comparable scales. API prices slashed 50%, Flash version free for commercial use, weights and code now on GitHub/Hugging Face — igniting a frenzy for visual agents in coding, shopping, and content creation.

Telegram
Telegram
WhatsApp
WhatsApp