Alloy
Mesh StorageManage

Redact

Strip or hash sensitive fields out of MCAP recordings on the device, before they leave your network

This feature is in beta. Schema and CLI may change between releases — pin your alloy-edge version when authoring rules for production.

Some recordings carry data you don't want leaving the robot — operator names in metadata, microphone audio on /audio/**, hostnames echoed in std_msgs/String topics. Alloy Edge can rewrite or drop those records on the device before edge-sync uploads, so the cloud only ever sees the sanitised version.

The redactor is a streaming MCAP rewriter. It reads each record once, applies any matching rules, and writes a new file alongside the original — no buffering, no decode for channels you didn't list, and a self-documenting audit trail per file.

What you get

CapabilityWhat it does
Channel filterDrop or allow whole topics by glob (/audio/**, /user/*). Runs before any decode — costs nothing.
Per-topic transformsRewrite individual messages with put (whitelist — spell out what you keep) or patch (denylist — override specific fields). Templates use Jinja2.
Metadata redactionSame shape applied to MCAP metadata records (operator info, calibration, config blobs).
Regex redactorsStrip patterns out of string fields with regex_strip (one pattern + replacement) or regex_redact (apply a list of patterns, all matches → [REDACTED]). Bring your own patterns for emails, phone numbers, IPs, employee IDs, etc.
Hash salt rotationhash(...) reads ${VAR} from the environment at config-load time. Rotate by changing the env var — no rule edits.
Dry-run modeRun the filter with upload_type: none and route both originals and redacted artefacts to <input_dir>/.dry-run/ to inspect locally before flipping it on.
Self-documenting auditJSONL sidecar + embedded MCAP metadata record per file. Each carries rules_hash so you can prove later which rules version produced a given file.
Failure policyFail-closed by default — a broken rule can skip the file instead of uploading unredacted data. pass_original is opt-in.

When to use it

You want to…Use redaction?
Drop a whole topic the cloud should never seeYes — channel filter (channels.deny)
Hash an operator's name in metadata for complianceYes — metadata: rule with hash(...)
Replace one field in a known message with a constantYes — inline patch rule
Convert a high-bandwidth topic into a tiny summaryYes — put rule with a Jinja template
Just stop recording the topic in the first placeNo — narrow your recorder's topic list (Manage → Recording configuration)

The recorder's topic list is your first line of defence — if a topic shouldn't be recorded at all, drop it there. Use redaction when you do need to record a topic (for replay, scenarios, or local diagnostics) but want to scrub something out before upload.

How it fits together

Redaction is configured in two files:

  • edge-sync.yaml — a redaction pipeline step (v0.8+), or the legacy redaction: block (v0.7, still accepted), plus lifecycle: policy. Points at the rules file, sets failure policy and audit behavior, and controls what happens to original/redacted artefacts after upload.
  • redaction.yaml — the rules themselves. Channel allow/deny, per-topic transforms, metadata-record transforms, named functions. This file is identical across v0.7 and v0.8.

The rules file is a separate file because operators rotate it independently of edge-sync.yaml (different review cadence, sometimes different reviewers).

recorder writes ──► /recordings/*.mcap ──► edge-sync picks up


                                          edge-transform
                                          (channel filter →
                                           per-topic transform)


                                          /recordings/.alloy-redacted/*.mcap   ──► upload
                                          + audit JSONL line
                                          + audit MCAP record (embedded)

Quick start

Step 1: Author a rules file

Create /etc/alloy/redaction.yaml:

enabled: true

# Cheapest filter — drop topics before any decode.
channels:
  allow: ["*"]
  deny:
    - "/audio/**"
    - "/user/*"

# Per-topic mappings. First match wins.
transforms:
  # Replace the data field with a hash.
  - match: "/robot_status"
    transform:
      type: patch
      schema: "std_msgs/msg/String"
      overrides:
        data: '{{ original | sha256_short }}'

# MCAP metadata records — separate from message channels.
metadata:
  - match: "operator_*"
    transform:
      type: patch
      overrides:
        operator_name: '{{ original | hash(algo="md5") }}'
        # Strip emails out of free-text notes — replacement is the second arg.
        operator_notes: '{{ original | regex_strip("[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}", "[EMAIL]") }}'

For multi-pattern stripping (emails and phone numbers and IPs in the same field), use regex_redact with a pattern list — it replaces every match with [REDACTED]:

overrides:
  description: '{{ original | regex_redact([
    "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}",
    "\\b\\d{3}[-.]?\\d{3}[-.]?\\d{4}\\b",
    "\\b\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\b"
  ]) }}'

The full schema (named functions, put vs patch, match selectors, hash salts, includes) lives in the redaction reference.

Step 2: Wire it into edge-sync.yaml

In v0.8+ redaction is a pipeline: step; the failure-policy and audit knobs move under lifecycle.transform. The v0.7 flat redaction: block still works in v0.8 (the loader auto-migrates it; run alloy-edge migrate to rewrite on disk).

version: 1
pipeline:
  - transform: /etc/alloy/redaction.yaml   # rules file from Step 1
    upload: true                           # upload the redacted artefact
    original_after: move                   # set the unredacted original aside after upload
lifecycle:
  original:
    move_to: /var/lib/alloy/edge-sync/originals
  transform:
    on_rule_error: skip_file               # drop the file if a rule fails (alt: skip_record [default] / pass_original)
    audit:
      jsonl_path: /var/lib/alloy/edge-sync/redaction-audit.jsonl   # set to enable the JSONL sidecar; omit to disable
      embed_in_mcap: true                                          # also write the audit summary inside the redacted file
redaction:
  enabled: true
  rules_file: /etc/alloy/redaction.yaml
  on_rule_error: skip_file         # drop the file (alt: skip_record [default — drop bad records, keep filtering] / pass_original)
  audit:
    jsonl_path: /var/lib/alloy/edge-sync/redaction-audit.jsonl   # set to enable the JSONL sidecar; omit to disable
    embed_in_mcap: true                                          # also write the audit summary inside the redacted file
lifecycle:
  original:
    after: move
    move_to: /var/lib/alloy/edge-sync/originals
  redacted:
    after: keep

Step 3: Dry-run before flipping it on

Run the pipeline with no network: set upload_type: none in edge-sync.yaml, and route the originals and the redacted artefacts into a .dry-run/ sandbox using lifecycle. Scrub the result, confirm it matches expectations, then revert to normal uploads.

edge-sync.yaml — dry-run overrides
upload_type: none                  # filter runs, nothing leaves the device
version: 1
pipeline:
  - transform: /etc/alloy/redaction.yaml
    original_after: move
lifecycle:
  original:
    after: move
    move_to: .dry-run/originals    # v0.8 rejects a shared move_to — keep the two distinct
  transform:
    after: move
    move_to: .dry-run/redacted
edge-sync.yaml — dry-run overrides
upload_type: none                  # filter runs, nothing leaves the device
redaction:
  enabled: true
  rules_file: /etc/alloy/redaction.yaml
lifecycle:
  original:
    after: move
    move_to: .dry-run              # relative to input_dir
  redacted:
    after: move
    move_to: .dry-run

Or, for a one-shot run without editing the file, pass --dry-run to the CLI — it applies the same overrides at load time (and is exempt from the v0.8 shared-move_to check, routing both into one timestamped .dry-run/ subdir):

alloy-edge sync --config /etc/alloy/edge-sync.yaml --dry-run

Both the unredacted originals and the redacted artefacts land under <input_dir>/.dry-run/. Inspect them the same way you inspect any MCAP — mcap info, mcap cat, or drag-drop into the web uploader to open in Foxglove / Alloy Replay.

Step 4: Enable and watch

Once dry-run looks right, remove the dry-run overrides (restore upload_type to whatever you normally use, drop the lifecycle block or set the actions you actually want for steady-state), restart edge-sync, and tail the audit log:

tail -f /var/lib/alloy/edge-sync/redaction-audit.jsonl

Each line is one file's redaction summary — messages and bytes in/out, per-channel dropped + transformed counts, denied channels, per-helper match counts, plus the rules_hash (sha256:…) of the config that produced it.

The two transform shapes — put vs patch

Each per-topic rule is one of two shapes. Pick based on how stable the upstream schema is.

put — whitelist (safe by default)

Spell out the entire output. Anything not mentioned in the template disappears. New upstream fields don't leak.

- match: "/operator/command"
  transform:
    type: put
    schema: "your_msgs/OperatorCommand"
    template: |-
      {
        "operator_id": {{ operator_id | hash(algo="sha256") | tojson }},
        "command": {{ command | tojson }},
        "timestamp": {{ timestamp | tojson }}
      }

operator_id is hashed; command and timestamp pass through. Any other field upstream — notes, location, payload — disappears. That's the put semantic: spell out what you keep, everything else drops.

Use put for compliance channels — anything where "this field appeared upstream and we didn't redact it" would be a problem.

patch — denylist (concise)

Pass everything through, override only the listed fields. New upstream fields flow through unchanged.

- match: "/robot_status"
  transform:
    type: patch
    schema: "std_msgs/msg/String"
    overrides:
      data: '{{ original | hash(algo="sha256") }}'

Use patch when the schema is stable and you only need to neutralise one or two fields. If upstream adds a new field you didn't anticipate, it'll flow through — that's the trade-off for the shorter rule.

Examples here use hash(algo="sha256") (salted) for de-identification. There's also sha256_short for content fingerprinting — same value always hashes to the same 8-char digest, but unsalted — so it's reversible with a wordlist and not safe for de-id. See the helper reference.

When a rule fails

Templates can break — a Jinja syntax error, a schema change that removes a field reference. Two distinct failure points:

  • Config-load time: bad Jinja syntax (typo, unclosed tag, unknown filter) rejects the rules file before edge-sync starts. You'll see this immediately — not at upload time.

  • Runtime: a syntactically-valid template that fails on a particular message (e.g. {{ status[0].hardware_id }} when a message has empty status). Behaviour is governed by on_rule_error:

    • on_rule_error: skip_record (default) — the offending record is dropped, the rest of the file is filtered and uploaded. Lets one bad message not spike the whole file.
    • on_rule_error: skip_file — the file is not uploaded when a rule fails. Pair with lifecycle.original.after: move if you want to retain failed originals for operator review.
    • on_rule_error: pass_original — fail-open: the unredacted original uploads. Requires explicit opt-in because it can leak. Use only when "drop nothing on the floor" beats "leak nothing", and document why.

The audit trail

Every redacted file gets two parallel audit records:

CarrierWhereWhy
JSONL sidecar{state_dir}/redaction-audit.jsonl (one line per file)Cheap to grep, cheap to stream — the operational logbook
Embedded MCAP recordA metadata record named alloy.redaction.audit inside the redacted fileThe redacted file documents itself — survives re-uploads, copies, and forwards

The two carriers contain the same JSON shape. Both default on. Turn embed_in_mcap off only when the rule layout itself is sensitive (e.g. you don't want the redacted file to disclose which topics were touched).

Each audit entry includes a rules_hash (formatted sha256:<hex>) over the merged config so you can prove later which version of the rules produced a given file.

Hash salts and rotation

{{ original | hash(algo="md5") }} and friends use a salt configured at the top of redaction.yaml:

hash_salt: "${ALLOY_HASH_SALT}"

The ${VAR} form is interpolated from the environment at config-load time. Rotate by changing the env var and restarting edge-sync — no rule edit needed. The audit log records hash_salt_fingerprint: sha256(salt)[:8] per file so a compliance reviewer can prove a rotation happened without ever seeing the salt.

If the env var isn't set when the config loads, the loader fails fast — you can't accidentally ship with a missing salt.

Channel filter is the fastest

Rules under transforms: decode the message; rules under channels.deny don't. If a topic should never leave the robot, deny it at the channel filter — it costs nothing and never decodes.

Only reach for a transform: when you need to keep some part of a message and remove or rewrite the rest.

Limits and things to know

  • MCAP only. Other recording formats aren't supported by the redactor.
  • First match wins. Order rules from most-specific to most-general.

Next steps

  • Redaction reference — full redaction.yaml schema: every field, the function library, and the match selector forms.
  • Configuration reference — the redaction pipeline step (and legacy redaction: block) in edge-sync.yaml (rules file path, output/audit knobs, failure policy) plus lifecycle retention controls.

On this page