Runtime boundary usage: inline guards before MCP/tool calls

by armorerlabs - opened about 6 hours ago

I have been comparing prompt-injection defenses from the agent-runtime side, where the guard sits directly before memory writes, stored outputs, or MCP/tool calls.

The main constraint I am trying to optimize for is not only classification quality. If the guard is on the hot path, latency becomes product latency, and the runtime needs a small structured decision it can act on.

I wrote up the benchmark note for Armorer Guard here: https://armorerlabs.com/blog/armorer-guard-inline-prompt-injection-defense

In the default-threshold run, Guard completed 977 cases at 3.4ms average / 4.3ms p95 locally, with no scanner network calls.

For HF builders using prompt-injection models or local guards: what output shape has been most useful in actual agent runtimes? I am leaning toward suspicious, reasons, confidence, scan id, sanitized text, and enough metadata to tie the decision back to the tool call boundary.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment