Alloy
Mesh StorageManage

Redact

Strip or hash sensitive fields out of MCAP recordings on the device, before they leave your network

This feature is in beta. Schema and CLI may change between releases — pin your alloy-edge version when authoring rules for production.

Some recordings carry data you don't want leaving the robot — operator names in metadata, microphone audio on /audio/**, hostnames echoed in std_msgs/String topics. Alloy Edge can rewrite or drop those records on the device before edge-sync uploads, so the cloud only ever sees the sanitised version.

The redactor is a streaming MCAP rewriter. It reads each record once, applies any matching rules, and writes a new file alongside the original — no buffering, no decode for channels you didn't list, and a self-documenting audit trail per file.

What you get

CapabilityWhat it does
Channel filterDrop or allow whole topics by glob (/audio/**, /user/*). Runs before any decode — costs nothing.
Per-topic transformsRewrite individual messages with put (whitelist — spell out what you keep) or patch (denylist — override specific fields). Templates use Jinja2.
Metadata redactionSame shape applied to MCAP metadata records (operator info, calibration, config blobs).
Regex redactorsStrip patterns out of string fields with regex_strip (one pattern + replacement) or regex_redact (apply a list of patterns, all matches → [REDACTED]). Bring your own patterns for emails, phone numbers, IPs, employee IDs, etc.
Hash salt rotationhash(...) reads ${VAR} from the environment at config-load time. Rotate by changing the env var — no rule edits.
Dry-run modeWrite filtered output to <input_dir>/.dry-run/, skip uploads, inspect locally before flipping it on.
Self-documenting auditJSONL sidecar + embedded MCAP metadata record per file. Each carries rules_hash so you can prove later which rules version produced a given file.
Failure policyFail-closed by default — a broken rule sends the file to quarantine, not the cloud. pass_original is opt-in.

When to use it

You want to…Use redaction?
Drop a whole topic the cloud should never seeYes — channel filter (channels.deny)
Hash an operator's name in metadata for complianceYes — metadata: rule with hash(...)
Replace one field in a known message with a constantYes — inline patch rule
Convert a high-bandwidth topic into a tiny summaryYes — put rule with a Jinja template
Just stop recording the topic in the first placeNo — narrow your recorder's topic list (Manage → Recording configuration)

The recorder's topic list is your first line of defence — if a topic shouldn't be recorded at all, drop it there. Use redaction when you do need to record a topic (for replay, scenarios, or local diagnostics) but want to scrub something out before upload.

How it fits together

Redaction is configured in two files:

  • edge-sync.yaml — the redaction: block. Points at the rules file, sets the failure policy, configures the audit log and quarantine directory.
  • redaction.yaml — the rules themselves. Channel allow/deny, per-topic transforms, metadata-record transforms, named functions.

The rules file is a separate file because operators rotate it independently of edge-sync.yaml (different review cadence, sometimes different reviewers).

recorder writes ──► /recordings/*.mcap ──► edge-sync picks up


                                          edge-transform
                                          (channel filter →
                                           per-topic transform)


                                          /quarantine/*.mcap   ──► upload
                                          + audit JSONL line
                                          + audit MCAP record (embedded)

Quick start

Step 1: Author a rules file

Create /etc/alloy/redaction.yaml:

enabled: true

# Cheapest filter — drop topics before any decode.
channels:
  allow: ["*"]
  deny:
    - "/audio/**"
    - "/user/*"

# Per-topic mappings. First match wins.
transforms:
  # Replace the data field with a hash.
  - match: "/robot_status"
    transform:
      type: patch
      schema: "std_msgs/msg/String"
      overrides:
        data: '{{ original | sha256_short }}'

# MCAP metadata records — separate from message channels.
metadata:
  - match: "operator_*"
    transform:
      type: patch
      overrides:
        operator_name: '{{ original | hash(algo="md5") }}'
        # Strip emails out of free-text notes — replacement is the second arg.
        operator_notes: '{{ original | regex_strip("[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}", "[EMAIL]") }}'

For multi-pattern stripping (emails and phone numbers and IPs in the same field), use regex_redact with a pattern list — it replaces every match with [REDACTED]:

overrides:
  description: '{{ original | regex_redact([
    "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}",
    "\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b",
    "\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b"
  ]) }}'

The full schema (named functions, put vs patch, match selectors, hash salts, includes) lives in the redaction reference.

Step 2: Wire it into edge-sync.yaml

redaction:
  enabled: true
  rules_file: /etc/alloy/redaction.yaml
  on_rule_error: skip_file         # drop the file (alt: skip_record [default — drop bad records, keep filtering] / pass_original)
  quarantine:
    mode: keep                     # keep | delete — preserves originals so operators can review
    dir: /var/lib/alloy/edge-sync/quarantine
  audit:
    jsonl_path: /var/lib/alloy/edge-sync/redaction-audit.jsonl   # set to enable the JSONL sidecar; omit to disable
    embed_in_mcap: true                                          # also write the audit summary inside the redacted file

Step 3: Dry-run before flipping it on

Set dry_run: true in redaction.yaml (or pass --dry-run to alloy-edge sync). The redactor writes filtered output to <input_dir>/.dry-run/<timestamp>/ (alongside the watched recordings) and skips uploads entirely — scrub the result, confirm it matches expectations, then turn dry-run off.

alloy-edge sync --config /etc/alloy/edge-sync.yaml --dry-run

Inspect the dry-run output the same way you inspect any MCAP — mcap info, mcap cat, or open it in Foxglove / Alloy Replay (drag-drop into the web uploader).

Step 4: Enable and watch

Once dry-run looks right, set dry_run: false, restart edge-sync, and tail the audit log:

tail -f /var/lib/alloy/edge-sync/redaction-audit.jsonl

Each line is one file's redaction summary — messages and bytes in/out, per-channel dropped + transformed counts, denied channels, per-helper match counts, plus the rules_hash (sha256:…) of the config that produced it.

The two transform shapes — put vs patch

Each per-topic rule is one of two shapes. Pick based on how stable the upstream schema is.

put — whitelist (safe by default)

Spell out the entire output. Anything not mentioned in the template disappears. New upstream fields don't leak.

- match: "/operator/command"
  transform:
    type: put
    schema: "your_msgs/OperatorCommand"
    template: |-
      {
        "operator_id": {{ operator_id | hash(algo="sha256") | tojson }},
        "command": {{ command | tojson }},
        "timestamp": {{ timestamp | tojson }}
      }

operator_id is hashed; command and timestamp pass through. Any other field upstream — notes, location, payload — disappears. That's the put semantic: spell out what you keep, everything else drops.

Use put for compliance channels — anything where "this field appeared upstream and we didn't redact it" would be a problem.

patch — denylist (concise)

Pass everything through, override only the listed fields. New upstream fields flow through unchanged.

- match: "/robot_status"
  transform:
    type: patch
    schema: "std_msgs/msg/String"
    overrides:
      data: '{{ original | hash(algo="sha256") }}'

Use patch when the schema is stable and you only need to neutralise one or two fields. If upstream adds a new field you didn't anticipate, it'll flow through — that's the trade-off for the shorter rule.

Examples here use hash(algo="sha256") (salted) for de-identification. There's also sha256_short for content fingerprinting — same value always hashes to the same 8-char digest, but unsalted — so it's reversible with a wordlist and not safe for de-id. See the helper reference.

When a rule fails

Templates can break — a Jinja syntax error, a schema change that removes a field reference. Two distinct failure points:

  • Config-load time: bad Jinja syntax (typo, unclosed tag, unknown filter) rejects the rules file before edge-sync starts. You'll see this immediately — not at upload time.

  • Runtime: a syntactically-valid template that fails on a particular message (e.g. {{ status[0].hardware_id }} when a message has empty status). Behaviour is governed by on_rule_error:

    • on_rule_error: skip_record (default) — the offending record is dropped, the rest of the file is filtered and uploaded. Lets one bad message not spike the whole file.
    • on_rule_error: skip_file — the original lands in the quarantine directory and is not uploaded. Safest. Operator fixes the rule, removes the file from quarantine, and edge-sync re-tries on its next sweep.
    • on_rule_error: pass_original — fail-open: the unredacted original uploads. Requires explicit opt-in because it can leak. Use only when "drop nothing on the floor" beats "leak nothing", and document why.

The audit trail

Every redacted file gets two parallel audit records:

CarrierWhereWhy
JSONL sidecar{state_dir}/redaction-audit.jsonl (one line per file)Cheap to grep, cheap to stream — the operational logbook
Embedded MCAP recordA metadata record named alloy.redaction.audit inside the redacted fileThe redacted file documents itself — survives re-uploads, copies, and forwards

The two carriers contain the same JSON shape. Both default on. Turn embed_in_mcap off only when the rule layout itself is sensitive (e.g. you don't want the redacted file to disclose which topics were touched).

Each audit entry includes a rules_hash (formatted sha256:<hex>) over the merged config so you can prove later which version of the rules produced a given file.

Hash salts and rotation

{{ original | hash(algo="md5") }} and friends use a salt configured at the top of redaction.yaml:

hash_salt: "${ALLOY_HASH_SALT}"

The ${VAR} form is interpolated from the environment at config-load time. Rotate by changing the env var and restarting edge-sync — no rule edit needed. The audit log records hash_salt_fingerprint: sha256(salt)[:8] per file so a compliance reviewer can prove a rotation happened without ever seeing the salt.

If the env var isn't set when the config loads, the loader fails fast — you can't accidentally ship with a missing salt.

Channel filter is the fastest

Rules under transforms: decode the message; rules under channels.deny don't. If a topic should never leave the robot, deny it at the channel filter — it costs nothing and never decodes.

Only reach for a transform: when you need to keep some part of a message and remove or rewrite the rest.

Limits and things to know

  • MCAP only. Other recording formats aren't supported by the redactor.
  • First match wins. Order rules from most-specific to most-general.

Next steps

  • Redaction reference — full redaction.yaml schema: every field, the function library, and the match selector forms.
  • Configuration reference — the redaction: block in edge-sync.yaml (rules file path, quarantine, audit, failure policy).

On this page