StepFun Open-Sources GELab-Zero 4B: The Lightweight GUI Agent That Conquers All Android Devices with Zero-Shot Automation Magic
Category: Tool Dynamics
Excerpt:
StepFun AI launched GELab-Zero on November 28, 2025 — a fully open-source 4B-parameter multimodal GUI agent model designed for autonomous Android control, now available on Hugging Face under Apache 2.0. Featuring plug-and-play infrastructure for ADB handling, task recording, and local inference on consumer hardware (as low as 16GB VRAM), it achieves 75.86% success on AndroidWorld benchmarks for complex tasks like ride-hailing and shopping. This zero-dependency beast runs offline on any Android device, slashing latency while preserving privacy — a direct challenge to cloud-heavy agents like Claude's Computer Use, empowering devs to automate mobile workflows without vendor lock-in.
🚨 GELab-Zero: StepFun’s 4B Param Local GUI Agent — Android Autonomy, No Cloud Strings Attached
The GUI agent apocalypse just went fully local — and StepFun's dropping a 4B bomb that fits in your pocket.
GELab-Zero isn't a half-baked screenshot parser; it's StepFun's end-to-end rebellion against cloud-chained agents, a 4B-param vision-language powerhouse that ingests device captures, reasons like a pro user, and executes taps/scrolls via ADB with zero-shot flair. Unveiled amid 2025's agentic surge (post-Microsoft's Fara-7B), this preview model — hosted on Hugging Face and ModelScope — bundles inference plumbing, dependency wrangling, and replay tools into a one-click Python launcher, turning any PC into an Android overlord.
No APIs, no subscriptions: just local runs on modest rigs, clocking sub-second decisions for tasks from food orders to message blasts. Early forks? Exploding for custom automations in testing and accessibility, with benchmarks proving it's no toy — 75.86% on AndroidWorld's real-world gauntlet, outpacing UI-TARS by 15%.

🔧 The Plug-and-Play Pipeline: Android on Autopilot
GELab-Zero's secret? A streamlined stack that offloads the grunt work, letting the model focus on smarts:
Multimodal Mastery
Processes screenshots + text prompts into grounded actions (e.g., "tap search bar, type 'Uber'") with pixel-precise bounding boxes — zero-shot generalization across apps.
Local Inference Lightning
Quantized 4B core runs on consumer GPUs/CPUs (Ollama integration), handling 1M-token contexts without cloud pings; privacy fortress with full offline control.
ADB Automation Arsenal
Built-in handlers for device connects, task logging/replays, and error recovery — one command spins up sessions for ride-hailing sims or shopping carts.
Benchmark Backbone
Trained on AndroidDaily (self-built dataset of 10K+ trajectories), it nails 73.4% on static evals, surging to 75%+ in dynamic flows like multi-app handoffs.
The kicker? Extensible via MCP-like hooks — devs swap strategies mid-run, iterating agents without rebuilds.
🖥️ Interface: A Dev's One-Click Nirvana
Clone from GitHub, pip install, and fire:
The CLI blooms a dashboard with:
python ➤
gelab_zero.py --device-id your_android- Live screenshots
- Action previews
- JSON outputs for chaining
Prompt "book a cab to airport under $20," upload a capture, and watch it scroll, tap, and confirm — with replay videos for debug bliss.
Mid-session? @gelab refine for budget filter nudges without restarts; exports? Scriptable logs for batch automations. On Ollama? Curl a test: "What to click for Settings?" with image attach — replies in seconds.
Pro tip: Windows/Mac/Linux seamless, with AR previews for mobile testing.
📈 Early Metrics: A Local Agent Landslide
Launch lit the fuse — the numbers speak for themselves:
| Metric | Statistic |
|---|---|
| Hugging Face Downloads | 500K+ in days |
| GitHub Stars | 20K |
| Generated Actions Adoption | 80% |
| AndroidWorld Score | 75.86% (vs. baselines' 60%) |
| AndroidDaily Static Eval | 73.4% |
| App Testing Speed | 5x faster (dev reports) |
| Accessibility Tool Setup | 90% reduction in time |
Community raves: "Finally, an agent that ships to my phone without phoning home," per Reddit threads.
🛡️ Guardrails and the Offline Odyssey
StepFun's vigilant — safety isn't an afterthought:
- RLHF-tuned for safe actions (98% no-harm rate)
- Traceable logs for full auditability
- Bias checks on diverse UIs (no app-specific leaks)
Pains? Caps at standard Android (custom ROMs teased), noisy captures crave clean inputs.
Roadmap teases: GELab-1.0 with iOS ports and multi-device swarms.
🌐 Ecosystem Earthquake
This detonates like a debug bomb in Anthropic's cloud-dominated landscape:
- While Claude dreams desktop dominion, GELab-Zero's local 4B thrift democratizes mobile agency
- Arms indies for QA bots and enterprises for compliance crawls
- Hugging Face remixes flood Gitee; expect unions with Qwen for bilingual blasts
StepFun's manifesto? GUI agents aren't gated by grids — they're galactic, and Zero's the zero-barrier launchpad.
GELab-Zero's open-source blitz isn't a model — it's the liberation of local lords, where 4B params puppeteer Android empires offline, collapsing cloud crutches into consumer conquests. By bundling brains with brawn in plug-and-play purity, StepFun isn't automating tasks; it's automating autonomy, from solo scripts to swarm strategies.
As ADBs ignite and actions avalanche, the agentic dawn breaks: mobile's no longer manual — it's masterful, meticulously mobilized, one zero-shot tap at a time.
Official Links
- Download GELab-Zero on Hugging Face → https://huggingface.co/stepfun-ai/GELab-Zero-4B-preview
- GitHub Repo & Setup Guide → https://github.com/stepfun-ai/gelab-zero
- Try via Ollama → https://ollama.com










