Mulu Code · v1

The most powerful AI app builder, per task.

Every top frontier model, real-browser verification on every change, and the lowest cost per task we have ever shipped. Built for builders who want power without the babysitting.

From $20 / month · All top frontier models included

Send me build-in-public updates

$0.04

Avg cost / task

100%

Changes verified

4×

Cheaper / task vs Claude Code

Context window

01 / Verification

Records the work. Proves it works.

Every change Mulu makes is tested in a real browser before you see it. It clicks through the change, watches the result, and saves a video of the run.

When the agent says it's done, there's footage. When something breaks, you have the replay.

02 / Cost

Cheapest per task.

Mulu reads your project once and remembers it. The agent picks the right files without being told, and ships the same change for a fraction of the tokens other tools spend.

Less re-reading. Less re-explaining. Lower bill every time you ship.

Cost per task

Claude Code

$0.16

Cursor

$0.12

Mulu Code

$0.04

Tokens spent per 1k LOC change

Claude Code

218k

Cursor

148k

Mulu Code

47k

Internal benchmark, 40 tasks. Lower is better.

03 / Models

Every top frontier model. On US servers.

GPT, Claude, Gemini, Kimi, GLM, Grok, Qwen, DeepSeek, MiniMax, Nemotron, and Mulu's own. Switch mid-task. No separate keys. No separate bills. Nothing routed offshore.

Mulu Agent 1

Fullstack · 256K

Claude Sonnet 4.6

Anthropic · 1M

Claude Opus 4.6

Anthropic · 1M

Claude Haiku 4.5

Anthropic · 200K

GPT-5.4

OpenAI · 1M

GPT-5.3 Codex

OpenAI · 400K

Gemini 3.1 Pro

Google · Deep Think

Gemini 3 Flash

Google · 1M

Grok 4.2

xAI · 2M

Kimi K2.6

Moonshot · 256K

MiniMax M2.7

MiniMax · reasoning

Qwen 3.6 Plus

Alibaba · 1M

GLM-5.1

Zhipu · 205K

DeepSeek V4 Pro

US-hosted

Nemotron 3 Super

NVIDIA · 1M

04 / Orchestrator

One model plans. Three execute.

Send a hard task to Opus. It writes the plan, splits the work, and dispatches it to three Kimi K2.6 workers running in parallel. You stop paying frontier prices to do execution that a cheaper model can handle.

You write one prompt. The right model plans, the right models build, and you get the bill for what each step actually needs.

Step 1 · Planning

Planner

Claude Opus 4.6

Step 2 · Execution, parallel

Worker 1

Kimi K2.6

Worker 2

Kimi K2.6

Worker 3

Kimi K2.6

05 / Team mode

A team of models, not just one.

Put Claude Opus, GPT-5.4, and Gemini 3.1 Pro on the same task. Each one takes the part it's strongest at. They share a working memory and call each other in when stuck.

You get the upside of every model, without picking just one.

You

Build a Stripe checkout page that handles webhooks and emails the customer.

Opus 4.6

I'll take the page layout and the form validation. Handing the webhook handler to GPT.

GPT-5.4

On the webhook. I'll wire it to Stripe, verify signatures, and write the retry logic.

Gemini 3.1 Pro

Picking up the email template and the transactional send. Will share a draft in a minute.

06 / Modes

A mode for every job.

Mulu picks the workflow for you, or you pick it yourself.

Swarm

Hundreds of changes at once.

Spin up workers that each take a slice. Each runs in its own branch. Mulu merges and reruns verification across the whole set.

Debug

Find it. Replay it. Fix it.

Auto-traces every run. When something breaks, Mulu replays the failure so the agent sees what you saw, then ships a verified fix.

Research

Send a team, get a brief.

For complex unknowns, dispatch a team of agents that reads docs, scans the repo, and returns a structured brief. You pick a direction.

Plan

Think first. Build second.

Generate a plan in plain English. Edit it. Approve it. Then ship the whole plan in one run, with verification on every step.

Ask

Just a question, no edits.

When you want an answer instead of a change. The agent reads your code, answers, and changes nothing.

Agent

Long-running tasks.

Hand off a big job and walk away. Mulu keeps working, sends a video when it's done, and flags anything ambiguous before it ships.

07 / Founder note

I built Mulu because every tool I tried lied to me about what it actually did.

The AI would tell me a feature was done. I'd check, it wasn't. It would say tests passed. They didn't. It would forget what we did yesterday by the time we picked it up today. The tools were powerful but you could never trust the output, and trust is the whole point.

So Mulu does two things differently. It tests every change in a real browser and saves the video. And it remembers your project across sessions so it never starts from scratch. Power, plus proof, plus memory.

If that sounds like the tool you wish you had, get on the list. We open access in waves and you'll hear from us soon.

Josh, Founder of Mulu Code

Questions, answered.

Q.01

When does it launch?

Soon. We're inviting people in waves through the private beta and opening it up as the queue clears.

Q.02

How much does it cost?

From $20 a month. Every feature is included. No usage surprises. Per task, we are the cheapest builder we know of.

Q.03

Which models do you support?

Every top frontier model. Anthropic, OpenAI, Google, Moonshot, xAI, Alibaba, DeepSeek, NVIDIA, MiniMax, and Mulu's own. Switch mid-task, no separate keys.

Q.04

Where does my data live?

Local first. Cloud sync is opt-in. Every model call routes through US infrastructure.

Q.05

Do I need to know how to code?

No. Plain English works. Voice input works too. Code is hidden by default and only shown if you ask for it.

Ready when you are.

Send me build-in-public updates