Meta Launches SAM Audio: The First Unified Multimodal Model That Isolates Any Sound from Complex Mixtures with Intuitive Prompts

Meta unveiled SAM Audio on December 16, 2025 — the groundbreaking extension of its Segment Anything family into audio, claiming the world's first unified multimodal model for sound separation. It isolates specific sounds like vocals, instruments, or ambient noise using text descriptions, visual clicks in videos, or time-span markings — alone or combined — all in a seamless, prompt-driven workflow. Open-sourced with small, base, and large variants, plus benchmarks and a perception encoder, it's now live on the Segment Anything Playground and Hugging Face, slashing barriers for creators and accelerating innovations in editing, accessibility, and beyond.

Meta Drops SAM Audio: The First Unified Multimodal Model That Isolates Any Sound with Text, Visual, or Time Prompts — Revolutionizing Audio Editing Forever

Meta unveiled SAM Audio on December 16, 2025 — extending the legendary Segment Anything family into sound with the world's first unified multimodal audio separation model. Supporting intuitive text descriptions, visual clicks in videos, and time-span anchors (alone or combined), it cleanly extracts voices, instruments, or ambient noise from messy real-world mixes in seconds. Open-sourced with small/base/large variants, PE-AV perception encoder, and new benchmarks, it's already crushing competitors on SAM Audio-Bench while powering faster-than-real-time edits — a game-changer for creators, podcasters, filmmakers, and accessibility tools.

Telegram
Telegram
WhatsApp
WhatsApp