The promise
Most tools do one of these. Promptzilla does all three, in order — so what you ship is the strongest version of the prompt, not the first one that ran.
A single prompt, a full agentic team (orchestrator, specialists, handoff contracts, error recovery), or a skill-equipped agent. Generation is model-aware — written differently for Anthropic, OpenAI, Google, Meta, and DeepSeek because each parses instructions differently.
Kaiju Lab throws seven adversarial categories at your prompt — edge cases, ambiguity, out-of-scope, prompt injection, jailbreaks — and scores every output across four weighted dimensions. It catches the regressions that pass a happy-path check and break in production.
The Optimiser hardens effectiveness — vague directives become testable imperatives, failure modes get explicit handling — then strips 30–50% of the tokens without losing intent. Lower latency, lower cost, less surface area for the model to get lost in. Battle proves the new version actually wins.
Generate — real outputs
A full agentic team. An importable n8n workflow. A repo-grounded rules file. A platform-precise visual prompt. Not snippets — complete, production-shaped output from a single brief.
Brief: “A cultural research team that delivers platform trend intelligence and brand activation ideas for a given client and market.”
Generated: 7 agents. Full production system prompts. JSON handoff contracts. Error recovery. Self-verification. ~1 minute.
Every output, every tool, every test case — free to try. Bring your own API key. Zero markup.
Test — proof
A real prompt, taken through three iterations of testing and optimisation. Watch the suite score climb — and watch the regression caught between iterations 2 and 3 that a happy-path check would have shipped. This is what battle-tested actually means.
Click the button below to run the first iteration and watch the score arc build.
The prompt: a customer support agent handling billing disputes. A customer emails about an unexpected charge on their statement. The prompt needs to acknowledge the concern, commit to investigating, and respond in structured JSON — no speculation, no invented commitments, no over-promising.
Kaiju Lab scored the original 86/100 on a realistic billing complaint. We ran it through the Optimiser. By iteration two, the adversarial suite had climbed to 86 — but the same realistic billing email now scored 34. The prompt had become so rigid it refused to help a completely normal customer message, citing insufficient input on a query that had all the information it needed.
Most testing tools would have shown you the 86 and declared success. The Lab showed you both numbers.
Optimiser produced a fully restructured prompt with explicit rules, format constraints, and refusal criteria. Suite scores climbed. Single-run dropped — the model became more rigid on a clean customer email.
Surgical patches pushed the suite score to 86. But the same realistic customer email scored 34 — the prompt now refused to help, citing INSUFFICIENT_INPUT. Most tools would have shipped this version. The Lab caught it.
One more pass with feedback focused on the regression. v3 kept the adversarial gains (suite still 86) and recovered the realistic case to 90. Better than where we started, with all the structural improvements.
This is what ‘refinement that's actually proven’ looks like. Numbers, regressions, recovery — all visible, all in the platform.
Optimise
Most prompt tools stop at “make it work.” The Optimiser makes it sharper and cheaper — then Battle proves the new version actually beats the old one on your test cases.
Every vague or hedged directive becomes a precise, testable imperative. Hallucination guards, explicit failure handling for the top domain-specific edge cases, tightened output contracts. Working instructions are preserved, never traded away for new ones.
Aggressive token reduction — filler removed, overlapping instructions merged, restated defaults deleted. Target: 30–50% fewer tokens with 100% of the critical information intact. Shorter prompts mean lower cost, lower latency, and less room for the model to drift.
Every change is annotated with the reasoning behind it and re-scored against the same suite. You see exactly what moved and why — you never ship on a hunch.
Run the old version head-to-head against the new one on identical adversarial inputs. The winner is decided on evidence, with full history kept so you can roll back any time.
A 40% shorter prompt that scores higher isn't a trade-off. It's the one you should have shipped in the first place.
The Loop
Generate a complete agentic team, a prompt, a rule file, or a visual prompt.
Stress-test with Kaiju Lab's 7 auto-generated test categories.
Sharpen effectiveness and cut 30–50% of tokens with the Optimiser. Prove the win head-to-head with Battle.
Import the workflow JSON into n8n. Drop the bundle into .claude/agents/. Or export CLAUDE.md, .cursorrules, markdown, or JSON. Paste and deploy.
No other platform closes the loop — from generation to adversarially-scored, versioned deployment — across agents, prompts, rules, and visual output.
Start from working config
Search GitHub for CLAUDE.md, .cursorrules, AGENTS.md and n8n workflows in real repos — read them in-app, pull them in, then run them through the optimiser like anything else.
Live GitHub search by stack — Claude Code, Cursor, n8n, MCP, agents. Stars-ranked, not a hand-curated list.
Require the exact artefact: CLAUDE.md, .cursorrules, .cursor/rules, AGENTS.md, SKILL.md. Repos without it drop out.
README and a generated how-to in-app — see how the file is wired before you pull anything in.
Import the file, then optimise or stress-test it the same way you would your own.
Built for how you work
Generate CLAUDE.md and .cursorrules grounded in your real repo — paste your tree and package.json, get rules with your actual paths and deps. Generate Claude Code sub-agent teams and export them as .claude/agents/ bundles. Generate full n8n workflows and download importable workflow JSON — real nodes, wired connections, paste straight into n8n. Skip the 40 minutes of setup before you ship.
Rules Generator · Agentic Team Generator · n8n Workflow export · .claude/agents/ bundles
Generate complete agentic teams with orchestrators, specialists, handoff contracts, and error recovery. Stress-test any prompt or team with 7 adversarial test cases in Kaiju Lab, scored by an AI judge across four dimensions. Compare versions head-to-head with Battle. Apply targeted improvements with scored diffs in Optimiser. Every change annotated with reasoning — you never ship on a hunch.
Agentic Team Generator · Kaiju Lab · Optimiser · Battle
Platform-precise prompts for Midjourney, DALL-E 3, Stable Diffusion, Flux, Ideogram, Imagen, Firefly, and Nano Banana. Cinematic video prompts for Veo 3, Runway Gen-4, Kling, Hailuo, Luma, and Seedance — 50+ directorial references, full control over shot, camera, tone, and lighting. Plus the Reverse Image Engine: upload any image, get the prompt that made it.
Image Prompt Generator · Video Prompt Generator · Reverse Image Engine
In the box
One tier. No markup. Pay your provider directly.
Free Tier
Free forever. No card required.
Promptzilla
Cancel anytime. Access until end of billing period.
Platform-only pricing. £5/month covers Promptzilla; you pay Anthropic directly at cost. No markup, no proxying, no surprise bills. Multi-provider via OpenRouter coming soon.
Payments by Stripe.
Questions