Blog

Notes from the build.

Experiments, findings, and things we learned while building Promptzilla 3000.

30 April 2026

Does prompt refinement actually work on multi-agent systems? We tested it.

We ran a 4-agent data analysis system through adversarial testing across three iterations. One failure mode scored 42/100. Here's what we found — and what almost shipped.

Read →