Zhipu AI Wraps Multimodal Open Source Week: Four Core Video Generation Technologies Fully Open-Sourced — Paving the Way for Next-Gen AI Filmmaking
On December 13, 2025, Zhipu AI concluded its "Multimodal Open Source Week" with a bang — open-sourcing four pivotal technologies powering advanced video generation: GLM-4.6V for visual understanding, AutoGLM for intelligent device control, GLM-ASR for high-fidelity speech recognition, and GLM-TTS for expressive speech synthesis. These modules, now freely available on GitHub and Hugging Face, enable end-to-end multimodal pipelines that fuse perception, reasoning, audio, and action — slashing barriers for developers building interactive video agents, embodied AI, and cinematic tools.
Zhipu AI Open-Sources AutoGLM: The Phone Agent Framework That Turns Every Smartphone into a True AI Phone
On December 9, 2025, Zhipu AI (Z.ai) fully open-sourced AutoGLM — the groundbreaking open-source phone agent model and framework capable of autonomously operating Android devices via natural language commands. Powered by the AutoGLM-Phone-9B multimodal model, it interprets screenshots, plans multi-step actions, and executes taps, swipes, and inputs across 50+ popular Chinese apps like WeChat, Taobao, Douyin, and Meituan. Deployable locally or in the cloud via ADB, this release democratizes "AI phone" capabilities, challenging closed ecosystems and enabling developers to build privacy-focused agents on any device.


