Safety Spec¶

Steerable ships a small but opinionated safety classifier for shell commands. The schema is CommandSafetyPattern; the runtime helper classify_shell_command(cmd) returns a severity grade plus the matching rule IDs.

CommandSafetyPattern¶

Field	Type	Required	Notes
`id`	`string`	yes	Stable rule identifier
`label`	`string`	yes	Short human-readable name
`description`	`string`	yes	Why the rule exists
`pattern`	`string` (regex)	yes	Matched against the full command line
`category`	`string`	yes	Free-form taxonomy bucket
`severity`	`'critical' \\| 'warning'`	yes	Determines UI behavior
`platform`	`'all' \\| 'unix' \\| 'windows'`	yes	Only evaluated on matching platforms

additionalProperties is disabled to keep rule processing deterministic across runtimes.

Severity grades¶

Severity	UI treatment
`critical`	Block by default. Surface a confirmation dialog if user-initiated.
`warning`	Show a banner but allow the command. Log every match.
`safe`	Auto-allow. (Returned by the classifier when no rule matches.)

safe is not a rule, it's the absence of a match.

Built-in rules (excerpt)¶

Category	Examples
Risky FS ops	`rm -rf /`, `rm -rf ~`, `rm -rf $HOME`
Privilege esc.	`sudo`, `su -`, `runas`
Remote pipe-exec	`curl … \\| sh`, `wget … \\| bash`
Dangerous git	`git push --force`, `git reset --hard`, `git clean -fd`
Disk destruction	`mkfs`, `dd if=… of=/dev/`
Network probes	`nmap`, `tcpdump` (warning, not critical)

The full rule set lives in packages/agent-runtime/py/src/steerable_agent_runtime/safety/builtins.py (canonical) and packages/agent-harness/ts/src/safety-patterns.ts (TS facade for parity tests).

Adding custom rules¶

from steerable_agent_protocol import CommandSafetyPattern
from steerable_agent_runtime.safety import classify_shell_command, register_rule

register_rule(CommandSafetyPattern(
    id="my-org-no-prod-db",
    label="No direct prod DB",
    description="Block any psql/mysql against the prod hostname.",
    pattern=r"(psql|mysql).*--host.*prod-db",
    category="data",
    severity="critical",
    platform="all",
))

result = classify_shell_command("psql --host prod-db -U admin")
# → {"severity": "critical", "matchedRules": ["my-org-no-prod-db"]}

Why a regex layer instead of a parser¶

Shell parsing is ambiguous (interactive shells, here-docs, eval, command substitution, …). A regex layer gives you a fast, conservative classifier that's easy to audit. It is not a sandbox — pair it with a real consent UI for local-mode tools (see Tools spec).