Redact
Strip or hash sensitive fields out of MCAP recordings on the device, before they leave your network
This feature is in beta. Schema and CLI may change between releases — pin your alloy-edge version when authoring rules for production.
Some recordings carry data you don't want leaving the robot — operator names in metadata, microphone audio on /audio/**, hostnames echoed in std_msgs/String topics. Alloy Edge can rewrite or drop those records on the device before edge-sync uploads, so the cloud only ever sees the sanitised version.
The redactor is a streaming MCAP rewriter. It reads each record once, applies any matching rules, and writes a new file alongside the original — no buffering, no decode for channels you didn't list, and a self-documenting audit trail per file.
What you get
| Capability | What it does |
|---|---|
| Channel filter | Drop or allow whole topics by glob (/audio/**, /user/*). Runs before any decode — costs nothing. |
| Per-topic transforms | Rewrite individual messages with put (whitelist — spell out what you keep) or patch (denylist — override specific fields). Templates use Jinja2. |
| Metadata redaction | Same shape applied to MCAP metadata records (operator info, calibration, config blobs). |
| Regex redactors | Strip patterns out of string fields with regex_strip (one pattern + replacement) or regex_redact (apply a list of patterns, all matches → [REDACTED]). Bring your own patterns for emails, phone numbers, IPs, employee IDs, etc. |
| Hash salt rotation | hash(...) reads ${VAR} from the environment at config-load time. Rotate by changing the env var — no rule edits. |
| Dry-run mode | Write filtered output to <input_dir>/.dry-run/, skip uploads, inspect locally before flipping it on. |
| Self-documenting audit | JSONL sidecar + embedded MCAP metadata record per file. Each carries rules_hash so you can prove later which rules version produced a given file. |
| Failure policy | Fail-closed by default — a broken rule sends the file to quarantine, not the cloud. pass_original is opt-in. |
When to use it
| You want to… | Use redaction? |
|---|---|
| Drop a whole topic the cloud should never see | Yes — channel filter (channels.deny) |
| Hash an operator's name in metadata for compliance | Yes — metadata: rule with hash(...) |
| Replace one field in a known message with a constant | Yes — inline patch rule |
| Convert a high-bandwidth topic into a tiny summary | Yes — put rule with a Jinja template |
| Just stop recording the topic in the first place | No — narrow your recorder's topic list (Manage → Recording configuration) |
The recorder's topic list is your first line of defence — if a topic shouldn't be recorded at all, drop it there. Use redaction when you do need to record a topic (for replay, scenarios, or local diagnostics) but want to scrub something out before upload.
How it fits together
Redaction is configured in two files:
edge-sync.yaml— theredaction:block. Points at the rules file, sets the failure policy, configures the audit log and quarantine directory.redaction.yaml— the rules themselves. Channel allow/deny, per-topic transforms, metadata-record transforms, named functions.
The rules file is a separate file because operators rotate it independently of edge-sync.yaml (different review cadence, sometimes different reviewers).
recorder writes ──► /recordings/*.mcap ──► edge-sync picks up
│
▼
edge-transform
(channel filter →
per-topic transform)
│
▼
/quarantine/*.mcap ──► upload
+ audit JSONL line
+ audit MCAP record (embedded)Quick start
Step 1: Author a rules file
Create /etc/alloy/redaction.yaml:
enabled: true
# Cheapest filter — drop topics before any decode.
channels:
allow: ["*"]
deny:
- "/audio/**"
- "/user/*"
# Per-topic mappings. First match wins.
transforms:
# Replace the data field with a hash.
- match: "/robot_status"
transform:
type: patch
schema: "std_msgs/msg/String"
overrides:
data: '{{ original | sha256_short }}'
# MCAP metadata records — separate from message channels.
metadata:
- match: "operator_*"
transform:
type: patch
overrides:
operator_name: '{{ original | hash(algo="md5") }}'
# Strip emails out of free-text notes — replacement is the second arg.
operator_notes: '{{ original | regex_strip("[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}", "[EMAIL]") }}'For multi-pattern stripping (emails and phone numbers and IPs in the same field), use regex_redact with a pattern list — it replaces every match with [REDACTED]:
overrides:
description: '{{ original | regex_redact([
"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}",
"\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b",
"\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b"
]) }}'The full schema (named functions, put vs patch, match selectors, hash salts, includes) lives in the redaction reference.
Step 2: Wire it into edge-sync.yaml
redaction:
enabled: true
rules_file: /etc/alloy/redaction.yaml
on_rule_error: skip_file # drop the file (alt: skip_record [default — drop bad records, keep filtering] / pass_original)
quarantine:
mode: keep # keep | delete — preserves originals so operators can review
dir: /var/lib/alloy/edge-sync/quarantine
audit:
jsonl_path: /var/lib/alloy/edge-sync/redaction-audit.jsonl # set to enable the JSONL sidecar; omit to disable
embed_in_mcap: true # also write the audit summary inside the redacted fileStep 3: Dry-run before flipping it on
Set dry_run: true in redaction.yaml (or pass --dry-run to alloy-edge sync). The redactor writes filtered output to <input_dir>/.dry-run/<timestamp>/ (alongside the watched recordings) and skips uploads entirely — scrub the result, confirm it matches expectations, then turn dry-run off.
alloy-edge sync --config /etc/alloy/edge-sync.yaml --dry-runInspect the dry-run output the same way you inspect any MCAP — mcap info, mcap cat, or open it in Foxglove / Alloy Replay (drag-drop into the web uploader).
Step 4: Enable and watch
Once dry-run looks right, set dry_run: false, restart edge-sync, and tail the audit log:
tail -f /var/lib/alloy/edge-sync/redaction-audit.jsonlEach line is one file's redaction summary — messages and bytes in/out, per-channel dropped + transformed counts, denied channels, per-helper match counts, plus the rules_hash (sha256:…) of the config that produced it.
The two transform shapes — put vs patch
Each per-topic rule is one of two shapes. Pick based on how stable the upstream schema is.
put — whitelist (safe by default)
Spell out the entire output. Anything not mentioned in the template disappears. New upstream fields don't leak.
- match: "/operator/command"
transform:
type: put
schema: "your_msgs/OperatorCommand"
template: |-
{
"operator_id": {{ operator_id | hash(algo="sha256") | tojson }},
"command": {{ command | tojson }},
"timestamp": {{ timestamp | tojson }}
}operator_id is hashed; command and timestamp pass through. Any other field upstream — notes, location, payload — disappears. That's the put semantic: spell out what you keep, everything else drops.
Use put for compliance channels — anything where "this field appeared upstream and we didn't redact it" would be a problem.
patch — denylist (concise)
Pass everything through, override only the listed fields. New upstream fields flow through unchanged.
- match: "/robot_status"
transform:
type: patch
schema: "std_msgs/msg/String"
overrides:
data: '{{ original | hash(algo="sha256") }}'Use patch when the schema is stable and you only need to neutralise one or two fields. If upstream adds a new field you didn't anticipate, it'll flow through — that's the trade-off for the shorter rule.
Examples here use hash(algo="sha256") (salted) for de-identification. There's also sha256_short for content fingerprinting — same value always hashes to the same 8-char digest, but unsalted — so it's reversible with a wordlist and not safe for de-id. See the helper reference.
When a rule fails
Templates can break — a Jinja syntax error, a schema change that removes a field reference. Two distinct failure points:
-
Config-load time: bad Jinja syntax (typo, unclosed tag, unknown filter) rejects the rules file before edge-sync starts. You'll see this immediately — not at upload time.
-
Runtime: a syntactically-valid template that fails on a particular message (e.g.
{{ status[0].hardware_id }}when a message has emptystatus). Behaviour is governed byon_rule_error:on_rule_error: skip_record(default) — the offending record is dropped, the rest of the file is filtered and uploaded. Lets one bad message not spike the whole file.on_rule_error: skip_file— the original lands in the quarantine directory and is not uploaded. Safest. Operator fixes the rule, removes the file from quarantine, and edge-sync re-tries on its next sweep.on_rule_error: pass_original— fail-open: the unredacted original uploads. Requires explicit opt-in because it can leak. Use only when "drop nothing on the floor" beats "leak nothing", and document why.
The audit trail
Every redacted file gets two parallel audit records:
| Carrier | Where | Why |
|---|---|---|
| JSONL sidecar | {state_dir}/redaction-audit.jsonl (one line per file) | Cheap to grep, cheap to stream — the operational logbook |
| Embedded MCAP record | A metadata record named alloy.redaction.audit inside the redacted file | The redacted file documents itself — survives re-uploads, copies, and forwards |
The two carriers contain the same JSON shape. Both default on. Turn embed_in_mcap off only when the rule layout itself is sensitive (e.g. you don't want the redacted file to disclose which topics were touched).
Each audit entry includes a rules_hash (formatted sha256:<hex>) over the merged config so you can prove later which version of the rules produced a given file.
Hash salts and rotation
{{ original | hash(algo="md5") }} and friends use a salt configured at the top of redaction.yaml:
hash_salt: "${ALLOY_HASH_SALT}"The ${VAR} form is interpolated from the environment at config-load time. Rotate by changing the env var and restarting edge-sync — no rule edit needed. The audit log records hash_salt_fingerprint: sha256(salt)[:8] per file so a compliance reviewer can prove a rotation happened without ever seeing the salt.
If the env var isn't set when the config loads, the loader fails fast — you can't accidentally ship with a missing salt.
Channel filter is the fastest
Rules under transforms: decode the message; rules under channels.deny don't. If a topic should never leave the robot, deny it at the channel filter — it costs nothing and never decodes.
Only reach for a transform: when you need to keep some part of a message and remove or rewrite the rest.
Limits and things to know
- MCAP only. Other recording formats aren't supported by the redactor.
- First match wins. Order rules from most-specific to most-general.
Next steps
- Redaction reference — full
redaction.yamlschema: every field, the function library, and the match selector forms. - Configuration reference — the
redaction:block inedge-sync.yaml(rules file path, quarantine, audit, failure policy).