Redact
Strip or hash sensitive fields out of MCAP recordings on the device, before they leave your network
This feature is in beta. Schema and CLI may change between releases — pin your alloy-edge version when authoring rules for production.
Some recordings carry data you don't want leaving the robot — operator names in metadata, microphone audio on /audio/**, hostnames echoed in std_msgs/String topics. Alloy Edge can rewrite or drop those records on the device before edge-sync uploads, so the cloud only ever sees the sanitised version.
The redactor is a streaming MCAP rewriter. It reads each record once, applies any matching rules, and writes a new file alongside the original — no buffering, no decode for channels you didn't list, and a self-documenting audit trail per file.
What you get
| Capability | What it does |
|---|---|
| Channel filter | Drop or allow whole topics by glob (/audio/**, /user/*). Runs before any decode — costs nothing. |
| Per-topic transforms | Rewrite individual messages with put (whitelist — spell out what you keep) or patch (denylist — override specific fields). Templates use Jinja2. |
| Metadata redaction | Same shape applied to MCAP metadata records (operator info, calibration, config blobs). |
| Regex redactors | Strip patterns out of string fields with regex_strip (one pattern + replacement) or regex_redact (apply a list of patterns, all matches → [REDACTED]). Bring your own patterns for emails, phone numbers, IPs, employee IDs, etc. |
| Hash salt rotation | hash(...) reads ${VAR} from the environment at config-load time. Rotate by changing the env var — no rule edits. |
| Dry-run mode | Run the filter with upload_type: none and route both originals and redacted artefacts to <input_dir>/.dry-run/ to inspect locally before flipping it on. |
| Self-documenting audit | JSONL sidecar + embedded MCAP metadata record per file. Each carries rules_hash so you can prove later which rules version produced a given file. |
| Failure policy | Fail-closed by default — a broken rule can skip the file instead of uploading unredacted data. pass_original is opt-in. |
When to use it
| You want to… | Use redaction? |
|---|---|
| Drop a whole topic the cloud should never see | Yes — channel filter (channels.deny) |
| Hash an operator's name in metadata for compliance | Yes — metadata: rule with hash(...) |
| Replace one field in a known message with a constant | Yes — inline patch rule |
| Convert a high-bandwidth topic into a tiny summary | Yes — put rule with a Jinja template |
| Just stop recording the topic in the first place | No — narrow your recorder's topic list (Manage → Recording configuration) |
The recorder's topic list is your first line of defence — if a topic shouldn't be recorded at all, drop it there. Use redaction when you do need to record a topic (for replay, scenarios, or local diagnostics) but want to scrub something out before upload.
How it fits together
Redaction is configured in two files:
edge-sync.yaml— a redaction pipeline step (v0.8+), or the legacyredaction:block (v0.7, still accepted), pluslifecycle:policy. Points at the rules file, sets failure policy and audit behavior, and controls what happens to original/redacted artefacts after upload.redaction.yaml— the rules themselves. Channel allow/deny, per-topic transforms, metadata-record transforms, named functions. This file is identical across v0.7 and v0.8.
The rules file is a separate file because operators rotate it independently of edge-sync.yaml (different review cadence, sometimes different reviewers).
recorder writes ──► /recordings/*.mcap ──► edge-sync picks up
│
▼
edge-transform
(channel filter →
per-topic transform)
│
▼
/recordings/.alloy-redacted/*.mcap ──► upload
+ audit JSONL line
+ audit MCAP record (embedded)Quick start
Step 1: Author a rules file
Create /etc/alloy/redaction.yaml:
enabled: true
# Cheapest filter — drop topics before any decode.
channels:
allow: ["*"]
deny:
- "/audio/**"
- "/user/*"
# Per-topic mappings. First match wins.
transforms:
# Replace the data field with a hash.
- match: "/robot_status"
transform:
type: patch
schema: "std_msgs/msg/String"
overrides:
data: '{{ original | sha256_short }}'
# MCAP metadata records — separate from message channels.
metadata:
- match: "operator_*"
transform:
type: patch
overrides:
operator_name: '{{ original | hash(algo="md5") }}'
# Strip emails out of free-text notes — replacement is the second arg.
operator_notes: '{{ original | regex_strip("[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}", "[EMAIL]") }}'For multi-pattern stripping (emails and phone numbers and IPs in the same field), use regex_redact with a pattern list — it replaces every match with [REDACTED]:
overrides:
description: '{{ original | regex_redact([
"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}",
"\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b",
"\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b"
]) }}'The full schema (named functions, put vs patch, match selectors, hash salts, includes) lives in the redaction reference.
Step 2: Wire it into edge-sync.yaml
In v0.8+ redaction is a pipeline: step; the failure-policy and audit knobs move under lifecycle.transform. The v0.7 flat redaction: block still works in v0.8 (the loader auto-migrates it; run alloy-edge migrate to rewrite on disk).
version: 1
pipeline:
- transform: /etc/alloy/redaction.yaml # rules file from Step 1
upload: true # upload the redacted artefact
original_after: move # set the unredacted original aside after upload
lifecycle:
original:
move_to: /var/lib/alloy/edge-sync/originals
transform:
on_rule_error: skip_file # drop the file if a rule fails (alt: skip_record [default] / pass_original)
audit:
jsonl_path: /var/lib/alloy/edge-sync/redaction-audit.jsonl # set to enable the JSONL sidecar; omit to disable
embed_in_mcap: true # also write the audit summary inside the redacted fileredaction:
enabled: true
rules_file: /etc/alloy/redaction.yaml
on_rule_error: skip_file # drop the file (alt: skip_record [default — drop bad records, keep filtering] / pass_original)
audit:
jsonl_path: /var/lib/alloy/edge-sync/redaction-audit.jsonl # set to enable the JSONL sidecar; omit to disable
embed_in_mcap: true # also write the audit summary inside the redacted file
lifecycle:
original:
after: move
move_to: /var/lib/alloy/edge-sync/originals
redacted:
after: keepStep 3: Dry-run before flipping it on
Run the pipeline with no network: set upload_type: none in edge-sync.yaml, and route the originals and the redacted artefacts into a .dry-run/ sandbox using lifecycle. Scrub the result, confirm it matches expectations, then revert to normal uploads.
upload_type: none # filter runs, nothing leaves the device
version: 1
pipeline:
- transform: /etc/alloy/redaction.yaml
original_after: move
lifecycle:
original:
after: move
move_to: .dry-run/originals # v0.8 rejects a shared move_to — keep the two distinct
transform:
after: move
move_to: .dry-run/redactedupload_type: none # filter runs, nothing leaves the device
redaction:
enabled: true
rules_file: /etc/alloy/redaction.yaml
lifecycle:
original:
after: move
move_to: .dry-run # relative to input_dir
redacted:
after: move
move_to: .dry-runOr, for a one-shot run without editing the file, pass --dry-run to the CLI — it applies the same overrides at load time (and is exempt from the v0.8 shared-move_to check, routing both into one timestamped .dry-run/ subdir):
alloy-edge sync --config /etc/alloy/edge-sync.yaml --dry-runBoth the unredacted originals and the redacted artefacts land under <input_dir>/.dry-run/. Inspect them the same way you inspect any MCAP — mcap info, mcap cat, or drag-drop into the web uploader to open in Foxglove / Alloy Replay.
Step 4: Enable and watch
Once dry-run looks right, remove the dry-run overrides (restore upload_type to whatever you normally use, drop the lifecycle block or set the actions you actually want for steady-state), restart edge-sync, and tail the audit log:
tail -f /var/lib/alloy/edge-sync/redaction-audit.jsonlEach line is one file's redaction summary — messages and bytes in/out, per-channel dropped + transformed counts, denied channels, per-helper match counts, plus the rules_hash (sha256:…) of the config that produced it.
The two transform shapes — put vs patch
Each per-topic rule is one of two shapes. Pick based on how stable the upstream schema is.
put — whitelist (safe by default)
Spell out the entire output. Anything not mentioned in the template disappears. New upstream fields don't leak.
- match: "/operator/command"
transform:
type: put
schema: "your_msgs/OperatorCommand"
template: |-
{
"operator_id": {{ operator_id | hash(algo="sha256") | tojson }},
"command": {{ command | tojson }},
"timestamp": {{ timestamp | tojson }}
}operator_id is hashed; command and timestamp pass through. Any other field upstream — notes, location, payload — disappears. That's the put semantic: spell out what you keep, everything else drops.
Use put for compliance channels — anything where "this field appeared upstream and we didn't redact it" would be a problem.
patch — denylist (concise)
Pass everything through, override only the listed fields. New upstream fields flow through unchanged.
- match: "/robot_status"
transform:
type: patch
schema: "std_msgs/msg/String"
overrides:
data: '{{ original | hash(algo="sha256") }}'Use patch when the schema is stable and you only need to neutralise one or two fields. If upstream adds a new field you didn't anticipate, it'll flow through — that's the trade-off for the shorter rule.
Examples here use hash(algo="sha256") (salted) for de-identification. There's also sha256_short for content fingerprinting — same value always hashes to the same 8-char digest, but unsalted — so it's reversible with a wordlist and not safe for de-id. See the helper reference.
When a rule fails
Templates can break — a Jinja syntax error, a schema change that removes a field reference. Two distinct failure points:
-
Config-load time: bad Jinja syntax (typo, unclosed tag, unknown filter) rejects the rules file before edge-sync starts. You'll see this immediately — not at upload time.
-
Runtime: a syntactically-valid template that fails on a particular message (e.g.
{{ status[0].hardware_id }}when a message has emptystatus). Behaviour is governed byon_rule_error:on_rule_error: skip_record(default) — the offending record is dropped, the rest of the file is filtered and uploaded. Lets one bad message not spike the whole file.on_rule_error: skip_file— the file is not uploaded when a rule fails. Pair withlifecycle.original.after: moveif you want to retain failed originals for operator review.on_rule_error: pass_original— fail-open: the unredacted original uploads. Requires explicit opt-in because it can leak. Use only when "drop nothing on the floor" beats "leak nothing", and document why.
The audit trail
Every redacted file gets two parallel audit records:
| Carrier | Where | Why |
|---|---|---|
| JSONL sidecar | {state_dir}/redaction-audit.jsonl (one line per file) | Cheap to grep, cheap to stream — the operational logbook |
| Embedded MCAP record | A metadata record named alloy.redaction.audit inside the redacted file | The redacted file documents itself — survives re-uploads, copies, and forwards |
The two carriers contain the same JSON shape. Both default on. Turn embed_in_mcap off only when the rule layout itself is sensitive (e.g. you don't want the redacted file to disclose which topics were touched).
Each audit entry includes a rules_hash (formatted sha256:<hex>) over the merged config so you can prove later which version of the rules produced a given file.
Hash salts and rotation
{{ original | hash(algo="md5") }} and friends use a salt configured at the top of redaction.yaml:
hash_salt: "${ALLOY_HASH_SALT}"The ${VAR} form is interpolated from the environment at config-load time. Rotate by changing the env var and restarting edge-sync — no rule edit needed. The audit log records hash_salt_fingerprint: sha256(salt)[:8] per file so a compliance reviewer can prove a rotation happened without ever seeing the salt.
If the env var isn't set when the config loads, the loader fails fast — you can't accidentally ship with a missing salt.
Channel filter is the fastest
Rules under transforms: decode the message; rules under channels.deny don't. If a topic should never leave the robot, deny it at the channel filter — it costs nothing and never decodes.
Only reach for a transform: when you need to keep some part of a message and remove or rewrite the rest.
Limits and things to know
- MCAP only. Other recording formats aren't supported by the redactor.
- First match wins. Order rules from most-specific to most-general.
Next steps
- Redaction reference — full
redaction.yamlschema: every field, the function library, and the match selector forms. - Configuration reference — the redaction pipeline step (and legacy
redaction:block) inedge-sync.yaml(rules file path, output/audit knobs, failure policy) pluslifecycleretention controls.