Defensive Runtime Research Without Exploit Publication
Studying agent failure without arming attackers.
Type
Research Direction
Status
Published
Published
April 26, 2026
Systems
cerberusboundary
There is a real tension between publishing useful safety research and publishing material that lowers the cost of misuse. The resolution is not silence; it is discipline about which surface is shared.
### What Is Public-Safe
Controlled testing setups, evaluation harnesses, evidence-logging patterns, and mitigation workflows can be discussed publicly because they describe what defenders do. They do not need to be paired with reproducible exploit recipes to be useful.
### What Stays Internal
Specific exploit payloads, prompt-injection chains that survive current mitigations, and unpatched runtime weaknesses stay inside the lab. Cerberus is described publicly as a defensive harness, and that framing is load-bearing: it tells readers what the work is, and what it is not.