Redaction

Schema for redaction.yaml — channels, named functions, per-topic transforms, metadata mappings

The canonical reference for redaction.yaml. For the why-and-when, see Redact.

# Master switch.
enabled: true

# Optional — merge named functions from additional files.
includes:
  - /etc/alloy/transforms-common.yaml

# Topic-level allow/deny. Cheapest filter — runs before any decode.
channels:
  allow: ["*"]
  deny: ["/user/*", "/audio/**"]

# Salt for hash(...) helpers. ${VAR} is read from the environment at load time.
hash_salt: "${ALLOY_HASH_SALT}"

# Per-channel mappings. First match wins.
transforms: [...]

# Per-metadata-record mappings. Same shape as transforms.
metadata: [...]

# The named-function library. Referenced by transforms / metadata via `function:`.
functions: { ... }

# Optional — rule-file-level audit defaults. Operational overrides (CLI flags,
# edge-sync.yaml's `audit:` block) take precedence. Useful for "this rule set
# always wants its audit embedded" without forcing every call site to pass a flag.
audit:
  embed_in_mcap: true

# Write filtered output to a sandbox dir and skip uploads.
dry_run: false

Top-level

Field	Type	Default	Description
`enabled`	bool	`false`	Master switch. `false` → the redactor is bypassed even though edge-sync's `redaction.enabled` may be true.
`includes`	list of paths	`[]`	Additional files merged into the same `functions:` namespace, in declaration order. Last file wins on name collision; the entry-point file's own `functions:` always wins overall. Resolved relative to the directory of the file that contains them.
`channels`	object	`{}`	Topic-level allow/deny. See below.
`hash_salt`	string	—	Salt for `hash(...)` and friends. `${VAR}` is interpolated from the environment at load time. Missing env var → load-time error.
`functions`	map	`{}`	Named transform bodies — the "library" referenced by `transforms:` and `metadata:`.
`transforms`	list	`[]`	Per-channel mappings. First match wins.
`metadata`	list	`[]`	Per-metadata-record mappings. Same shape as `transforms` but `match:` globs against record names.
`audit`	object	—	Rule-file-level audit defaults. Currently one field: `embed_in_mcap` (bool). Absent → defer to `edge-sync.yaml`'s `audit:` block / CLI flags.
`dry_run`	bool	`false`	When `true`, write filtered output to `<input_dir>/.dry-run/<timestamp>/` and skip uploads. CLI `--dry-run` overrides the file.

`channels`

Topic-level allow/deny — the cheapest filter. Runs before any decode, so denied channels never pay the CDR-decode cost.

Field	Type	Default	Description
`allow`	list of globs	—	Fail-closed. `null` or `[]` → no topic passes. `[""]` → all topics pass (only `deny` filters). `["/odom", "/tf"]` → only matching topics pass.
`deny`	list of globs	`[]`	Always wins on conflict — a `deny` match drops the topic regardless of `allow`.

Globs are POSIX-style: * matches any segment, ** matches across segments.

`transforms[]` / `metadata[]`

Each entry has exactly one of function: or transform:. Both or neither → load-time error.

Field	Type	Required	Description
`match`	string or object	yes	Channel/record selector. See Match selector below.
`function`	string	one of	Reference to a named function in `functions:` (or merged from `includes:`).
`transform`	object	one of	Inline transform body — same shape as a `functions:` entry.

Match selector

Two equivalent forms — pick the shorthand for the common case.

# Shorthand — channel only (or record-name only for `metadata:`)
match: "/diagnostics"

# Object form — full selector
match:
  channel: "/diagnostics"
  schema: "diagnostic_msgs/DiagnosticArray"   # disambiguates same-topic-different-schema (rare but legal)
  encoding: "cdr"                             # or "ros2msg", "protobuf", etc.

For the metadata: block, the shorthand maps to record_name instead of channel:

match: "operator_*"           # shorthand → record_name

match:
  record_name: "operator_*"   # object form
  key: "operator_email"       # optional inner-key glob

Field	Used by	Description
`channel`	`transforms`	Topic glob (e.g. `/diagnostics`, `/sensors/*`).
`schema`	`transforms`	Optional schema-name glob — disambiguates two channels with the same topic but different schemas.
`encoding`	`transforms`	Optional encoding glob — `cdr`, `ros2msg`, `protobuf`.
`record_name`	`metadata`	MCAP metadata-record name glob.
`key`	`metadata`	Optional inner-key glob within a metadata record.

Transform bodies — `put` and `patch`

Each transform body has a type: discriminator. v1 ships two: put and patch.

`type: put` — whitelist

Spell out the entire output. Fields not mentioned in the template disappear. New upstream fields don't leak.

Field	Type	Default	Description
`schema`	string	—	ROS2 schema name the template targets (e.g. `diagnostic_msgs/DiagnosticArray`). Optional for `metadata:` rules.
`available_fields`	list of strings	`[]`	Documentation-only — human-readable list of fields the template references. Not enforced.
`template`	string (Jinja2)	—	Jinja2 template rendering the full output JSON. Omit `template:` for the identity put — the runtime short-circuits to a zero-copy passthrough (same cost as having no rule), useful when an operator wants the config to document a deliberate "this channel is fine, leave it alone."

`type: patch` — denylist

Pass everything through, override only the listed fields.

Field	Type	Default	Description
`schema`	string	—	ROS2 schema name. Optional for `metadata:` rules.
`overrides`	map	`{}`	Field-path → Jinja2 expression. Path syntax is `field`, `field.sub`, `array[].field`.

`functions`

Named transform bodies. Same shape as a transform: entry, minus the match: key (which lives on the call site).

functions:
  # Identity put — runtime short-circuits to zero-copy passthrough.
  identity_pass:
    type: put

  # Whitelist redactor for an operator-command message.
  redact_operator_cmd:
    type: put
    schema: "your_msgs/OperatorCommand"
    template: |-
      {
        "operator_id": {{ operator_id | hash(algo="sha256") | tojson }},
        "command": {{ command | tojson }},
        "timestamp": {{ timestamp | tojson }}
      }

  # Denylist counterpart — same schema, only override two fields.
  redact_operator_cmd_lite:
    type: patch
    schema: "your_msgs/OperatorCommand"
    overrides:
      operator_id: '{{ original | hash(algo="sha256") }}'
      notes: '""'

Functions from includes: land in the same namespace; local definitions in the entry-point file always win on collision.

Template helpers

Inside template: and overrides: expressions, the rendering engine is MiniJinja (Jinja2-compatible). On top of the standard filters, the redactor adds:

Helper	Use	Example
`hash(algo=...)`	Full hex digest of `sha256(salt + value)`. `algo`: `md5`, `sha256`. Use this for de-identification. Requires `hash_salt`.	`'{{ original \| hash(algo="sha256") }}'`
`sha256_short`	First 8 hex chars of `sha256(value)` — unsalted. Content fingerprint, not de-identification (reversible with a wordlist).	`'{{ s.name \| sha256_short }}'`
`regex_strip(pattern, replacement)`	Replace regex matches in the value with a literal string.	`'{{ original \| regex_strip("[A-Z]+", "[X]") }}'`
`regex_redact([patterns])`	Apply a list of regex patterns; replace every match with `[REDACTED]`. Convenience wrapper for chaining several `regex_strip` calls — patterns are user-supplied.	`'{{ original \| regex_redact([email_re, phone_re]) }}'`
`redact_if(predicate, replacement)`	Replace the value only when the predicate matches; otherwise pass through.	`'{{ original \| redact_if("hostname", "[HOST]") }}'`
`zero`	Type-aware zero — `""` for strings, `0` for numbers, etc.	`'{{ original \| zero }}'`
`tojson`	JSON-encode a value (custom registration; minijinja's built-in `tojson` requires a feature this crate doesn't pull in).	`'{{ header \| tojson }}'`

original is a context variable, not a filter. It's bound only inside patch.overrides (the pre-override field value) and metadata: rule expressions (the pre-redaction string). put.template does not see original — it sees the full decoded message and addresses fields by name ({{ header | tojson }}, {{ status[0].level }}).

hash_salt must be set if any rule calls hash(...). sha256_short does not consume the salt.

Resolution rules

Channel filter runs first. Denied topics drop before any decode.
First match wins per channel / record name. Anything not listed in transforms: is an implicit passthrough — no decode, no allocation, byte-for-byte copy.
includes: merge in declaration order (last include wins on function-name collision); local functions: in the entry-point file override every include.
Bad templates fail at load time. A Jinja syntax error in any template / override expression rejects the rules file before the agent starts.

Reload semantics

Edits to redaction.yaml (or any included file) take effect only after edge-sync restarts. The rules file is read once at startup; there is no file watcher today. A future release plans hot-reload using the source_files set the loader populates.

The merged config is hashed (sha256:<hex>, recorded as rules_hash on every audit entry) so you can prove later which version of the rules produced a given file.

Redaction

On this page