Skip to content

Safety Spec

Steerable ships a small but opinionated safety classifier for shell commands. The schema is CommandSafetyPattern; the runtime helper classify_shell_command(cmd) returns a severity grade plus the matching rule IDs.

CommandSafetyPattern

Field Type Required Notes
id string yes Stable rule identifier
label string yes Short human-readable name
description string yes Why the rule exists
pattern string (regex) yes Matched against the full command line
category string yes Free-form taxonomy bucket
severity 'critical' \| 'warning' yes Determines UI behavior
platform 'all' \| 'unix' \| 'windows' yes Only evaluated on matching platforms

additionalProperties is disabled to keep rule processing deterministic across runtimes.

Severity grades

Severity UI treatment
critical Block by default. Surface a confirmation dialog if user-initiated.
warning Show a banner but allow the command. Log every match.
safe Auto-allow. (Returned by the classifier when no rule matches.)

safe is not a rule, it's the absence of a match.

Built-in rules (excerpt)

Category Examples
Risky FS ops rm -rf /, rm -rf ~, rm -rf $HOME
Privilege esc. sudo, su -, runas
Remote pipe-exec curl … \| sh, wget … \| bash
Dangerous git git push --force, git reset --hard, git clean -fd
Disk destruction mkfs, dd if=… of=/dev/
Network probes nmap, tcpdump (warning, not critical)

The full rule set lives in packages/agent-runtime/py/src/steerable_agent_runtime/safety/builtins.py (canonical) and packages/agent-harness/ts/src/safety-patterns.ts (TS facade for parity tests).

Adding custom rules

from steerable_agent_protocol import CommandSafetyPattern
from steerable_agent_runtime.safety import classify_shell_command, register_rule

register_rule(CommandSafetyPattern(
    id="my-org-no-prod-db",
    label="No direct prod DB",
    description="Block any psql/mysql against the prod hostname.",
    pattern=r"(psql|mysql).*--host.*prod-db",
    category="data",
    severity="critical",
    platform="all",
))

result = classify_shell_command("psql --host prod-db -U admin")
# → {"severity": "critical", "matchedRules": ["my-org-no-prod-db"]}

Why a regex layer instead of a parser

Shell parsing is ambiguous (interactive shells, here-docs, eval, command substitution, …). A regex layer gives you a fast, conservative classifier that's easy to audit. It is not a sandbox — pair it with a real consent UI for local-mode tools (see Tools spec).