Mulu Code
Every top frontier model, real-browser verification on every change, and the lowest cost per task we have ever shipped. Built for builders who want power without the babysitting.
From $20 / month · All top frontier models included
Every change Mulu makes is tested in a real browser before you see it. It clicks through the change, watches the result, and saves a video of the run.
When the agent says it's done, there's footage. When something breaks, you have the replay.
Mulu reads your project once and remembers it. The agent picks the right files without being told, and ships the same change for a fraction of the tokens other tools spend.
Less re-reading. Less re-explaining. Lower bill every time you ship.
GPT, Claude, Gemini, Kimi, GLM, Grok, Qwen, DeepSeek, MiniMax, Nemotron, and Mulu's own. Switch mid-task. No separate keys. No separate bills. Nothing routed offshore.
Send a hard task to Opus. It writes the plan, splits the work, and dispatches it to three Kimi K2.6 workers running in parallel. You stop paying frontier prices to do execution that a cheaper model can handle.
You write one prompt. The right model plans, the right models build, and you get the bill for what each step actually needs.
Put Claude Opus, GPT-5.4, and Gemini 3.1 Pro on the same task. Each one takes the part it's strongest at. They share a working memory and call each other in when stuck.
You get the upside of every model, without picking just one.
Mulu picks the workflow for you, or you pick it yourself.
Spin up workers that each take a slice. Each runs in its own branch. Mulu merges and reruns verification across the whole set.
Auto-traces every run. When something breaks, Mulu replays the failure so the agent sees what you saw, then ships a verified fix.
For complex unknowns, dispatch a team of agents that reads docs, scans the repo, and returns a structured brief. You pick a direction.
Generate a plan in plain English. Edit it. Approve it. Then ship the whole plan in one run, with verification on every step.
When you want an answer instead of a change. The agent reads your code, answers, and changes nothing.
Hand off a big job and walk away. Mulu keeps working, sends a video when it's done, and flags anything ambiguous before it ships.
I built Mulu because every tool I tried lied to me about what it actually did.
The AI would tell me a feature was done. I'd check, it wasn't. It would say tests passed. They didn't. It would forget what we did yesterday by the time we picked it up today. The tools were powerful but you could never trust the output, and trust is the whole point.
So Mulu does two things differently. It tests every change in a real browser and saves the video. And it remembers your project across sessions so it never starts from scratch. Power, plus proof, plus memory.
If that sounds like the tool you wish you had, get on the list. We open access in waves and you'll hear from us soon.