Redact

Strip or hash sensitive fields out of MCAP recordings on the device, before they leave your network

This feature is in beta. Schema and CLI may change between releases — pin your alloy-edge version when authoring rules for production.

Some recordings carry data you don't want leaving the robot — operator names in metadata, microphone audio on /audio/**, hostnames echoed in std_msgs/String topics. Alloy Edge can rewrite or drop those records on the device before edge-sync uploads, so the cloud only ever sees the sanitised version.

The redactor is a streaming MCAP rewriter. It reads each record once, applies any matching rules, and writes a new file alongside the original — no buffering, no decode for channels you didn't list, and a self-documenting audit trail per file.

What you get

Capability	What it does
Channel filter	Drop or allow whole topics by glob (`/audio/*`, `/user/`). Runs before any decode — costs nothing.
Per-topic transforms	Rewrite individual messages with `put` (whitelist — spell out what you keep) or `patch` (denylist — override specific fields). Templates use Jinja2.
Metadata redaction	Same shape applied to MCAP metadata records (operator info, calibration, config blobs).
Regex redactors	Strip patterns out of string fields with `regex_strip` (one pattern + replacement) or `regex_redact` (apply a list of patterns, all matches → `[REDACTED]`). Bring your own patterns for emails, phone numbers, IPs, employee IDs, etc.
Hash salt rotation	`hash(...)` reads `${VAR}` from the environment at config-load time. Rotate by changing the env var — no rule edits.
Dry-run mode	Write filtered output to `<input_dir>/.dry-run/`, skip uploads, inspect locally before flipping it on.
Self-documenting audit	JSONL sidecar + embedded MCAP metadata record per file. Each carries `rules_hash` so you can prove later which rules version produced a given file.
Failure policy	Fail-closed by default — a broken rule sends the file to quarantine, not the cloud. `pass_original` is opt-in.

When to use it

You want to…	Use redaction?
Drop a whole topic the cloud should never see	Yes — channel filter (`channels.deny`)
Hash an operator's name in metadata for compliance	Yes — `metadata:` rule with `hash(...)`
Replace one field in a known message with a constant	Yes — inline `patch` rule
Convert a high-bandwidth topic into a tiny summary	Yes — `put` rule with a Jinja template
Just stop recording the topic in the first place	No — narrow your recorder's topic list (Manage → Recording configuration)

The recorder's topic list is your first line of defence — if a topic shouldn't be recorded at all, drop it there. Use redaction when you do need to record a topic (for replay, scenarios, or local diagnostics) but want to scrub something out before upload.

How it fits together

Redaction is configured in two files:

edge-sync.yaml — the redaction: block. Points at the rules file, sets the failure policy, configures the audit log and quarantine directory.
redaction.yaml — the rules themselves. Channel allow/deny, per-topic transforms, metadata-record transforms, named functions.

The rules file is a separate file because operators rotate it independently of edge-sync.yaml (different review cadence, sometimes different reviewers).

recorder writes ──► /recordings/*.mcap ──► edge-sync picks up
                                                │
                                                ▼
                                          edge-transform
                                          (channel filter →
                                           per-topic transform)
                                                │
                                                ▼
                                          /quarantine/*.mcap   ──► upload
                                          + audit JSONL line
                                          + audit MCAP record (embedded)

Quick start

Step 1: Author a rules file

Create /etc/alloy/redaction.yaml:

enabled: true

# Cheapest filter — drop topics before any decode.
channels:
  allow: ["*"]
  deny:
    - "/audio/**"
    - "/user/*"

# Per-topic mappings. First match wins.
transforms:
  # Replace the data field with a hash.
  - match: "/robot_status"
    transform:
      type: patch
      schema: "std_msgs/msg/String"
      overrides:
        data: '{{ original | sha256_short }}'

# MCAP metadata records — separate from message channels.
metadata:
  - match: "operator_*"
    transform:
      type: patch
      overrides:
        operator_name: '{{ original | hash(algo="md5") }}'
        # Strip emails out of free-text notes — replacement is the second arg.
        operator_notes: '{{ original | regex_strip("[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}", "[EMAIL]") }}'

For multi-pattern stripping (emails and phone numbers and IPs in the same field), use regex_redact with a pattern list — it replaces every match with [REDACTED]:

overrides:
  description: '{{ original | regex_redact([
    "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}",
    "\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b",
    "\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b"
  ]) }}'

The full schema (named functions, put vs patch, match selectors, hash salts, includes) lives in the redaction reference.

Step 2: Wire it into `edge-sync.yaml`

redaction:
  enabled: true
  rules_file: /etc/alloy/redaction.yaml
  on_rule_error: skip_file         # drop the file (alt: skip_record [default — drop bad records, keep filtering] / pass_original)
  quarantine:
    mode: keep                     # keep | delete — preserves originals so operators can review
    dir: /var/lib/alloy/edge-sync/quarantine
  audit:
    jsonl_path: /var/lib/alloy/edge-sync/redaction-audit.jsonl   # set to enable the JSONL sidecar; omit to disable
    embed_in_mcap: true                                          # also write the audit summary inside the redacted file

Step 3: Dry-run before flipping it on

Set dry_run: true in redaction.yaml (or pass --dry-run to alloy-edge sync). The redactor writes filtered output to <input_dir>/.dry-run/<timestamp>/ (alongside the watched recordings) and skips uploads entirely — scrub the result, confirm it matches expectations, then turn dry-run off.

alloy-edge sync --config /etc/alloy/edge-sync.yaml --dry-run

Inspect the dry-run output the same way you inspect any MCAP — mcap info, mcap cat, or open it in Foxglove / Alloy Replay (drag-drop into the web uploader).

Step 4: Enable and watch

Once dry-run looks right, set dry_run: false, restart edge-sync, and tail the audit log:

tail -f /var/lib/alloy/edge-sync/redaction-audit.jsonl

Each line is one file's redaction summary — messages and bytes in/out, per-channel dropped + transformed counts, denied channels, per-helper match counts, plus the rules_hash (sha256:…) of the config that produced it.

The two transform shapes — `put` vs `patch`

Each per-topic rule is one of two shapes. Pick based on how stable the upstream schema is.

`put` — whitelist (safe by default)

Spell out the entire output. Anything not mentioned in the template disappears. New upstream fields don't leak.

- match: "/operator/command"
  transform:
    type: put
    schema: "your_msgs/OperatorCommand"
    template: |-
      {
        "operator_id": {{ operator_id | hash(algo="sha256") | tojson }},
        "command": {{ command | tojson }},
        "timestamp": {{ timestamp | tojson }}
      }

operator_id is hashed; command and timestamp pass through. Any other field upstream — notes, location, payload — disappears. That's the put semantic: spell out what you keep, everything else drops.

Use put for compliance channels — anything where "this field appeared upstream and we didn't redact it" would be a problem.

`patch` — denylist (concise)

Pass everything through, override only the listed fields. New upstream fields flow through unchanged.

- match: "/robot_status"
  transform:
    type: patch
    schema: "std_msgs/msg/String"
    overrides:
      data: '{{ original | hash(algo="sha256") }}'

Use patch when the schema is stable and you only need to neutralise one or two fields. If upstream adds a new field you didn't anticipate, it'll flow through — that's the trade-off for the shorter rule.

Examples here use hash(algo="sha256") (salted) for de-identification. There's also sha256_short for content fingerprinting — same value always hashes to the same 8-char digest, but unsalted — so it's reversible with a wordlist and not safe for de-id. See the helper reference.

When a rule fails

Templates can break — a Jinja syntax error, a schema change that removes a field reference. Two distinct failure points:

Config-load time: bad Jinja syntax (typo, unclosed tag, unknown filter) rejects the rules file before edge-sync starts. You'll see this immediately — not at upload time.
Runtime: a syntactically-valid template that fails on a particular message (e.g. {{ status[0].hardware_id }} when a message has empty status). Behaviour is governed by on_rule_error:
- on_rule_error: skip_record (default) — the offending record is dropped, the rest of the file is filtered and uploaded. Lets one bad message not spike the whole file.
- on_rule_error: skip_file — the original lands in the quarantine directory and is not uploaded. Safest. Operator fixes the rule, removes the file from quarantine, and edge-sync re-tries on its next sweep.
- on_rule_error: pass_original — fail-open: the unredacted original uploads. Requires explicit opt-in because it can leak. Use only when "drop nothing on the floor" beats "leak nothing", and document why.

The audit trail

Every redacted file gets two parallel audit records:

Carrier	Where	Why
JSONL sidecar	`{state_dir}/redaction-audit.jsonl` (one line per file)	Cheap to grep, cheap to stream — the operational logbook
Embedded MCAP record	A metadata record named `alloy.redaction.audit` inside the redacted file	The redacted file documents itself — survives re-uploads, copies, and forwards

The two carriers contain the same JSON shape. Both default on. Turn embed_in_mcap off only when the rule layout itself is sensitive (e.g. you don't want the redacted file to disclose which topics were touched).

Each audit entry includes a rules_hash (formatted sha256:<hex>) over the merged config so you can prove later which version of the rules produced a given file.

Hash salts and rotation

{{ original | hash(algo="md5") }} and friends use a salt configured at the top of redaction.yaml:

hash_salt: "${ALLOY_HASH_SALT}"

The ${VAR} form is interpolated from the environment at config-load time. Rotate by changing the env var and restarting edge-sync — no rule edit needed. The audit log records hash_salt_fingerprint: sha256(salt)[:8] per file so a compliance reviewer can prove a rotation happened without ever seeing the salt.

If the env var isn't set when the config loads, the loader fails fast — you can't accidentally ship with a missing salt.

Channel filter is the fastest

Rules under transforms: decode the message; rules under channels.deny don't. If a topic should never leave the robot, deny it at the channel filter — it costs nothing and never decodes.

Only reach for a transform: when you need to keep some part of a message and remove or rewrite the rest.

Limits and things to know

MCAP only. Other recording formats aren't supported by the redactor.
First match wins. Order rules from most-specific to most-general.

Next steps

Redaction reference — full redaction.yaml schema: every field, the function library, and the match selector forms.
Configuration reference — the redaction: block in edge-sync.yaml (rules file path, quarantine, audit, failure policy).

Redact

On this page