Skip to content

Sensitivity and Redaction

The sensitivity system ensures sensitive data is never silently included in reports or backups. It runs automatically on every collect and backup — you don't need to think about it unless you want to override it.

Pipeline

Every file processed goes through a three-stage pipeline:

Content → Detection (regex per line) → Classification (HIGH/MEDIUM/LOW) → Action (skip/redact/include)
  1. Detection: each line is tested against 27+ regex patterns
  2. Classification: each match is tagged with a severity level
  3. Action: the highest-severity finding determines the file-level action

Severity Levels

LevelMeaningDefault Action
HIGHCredentials that grant access — private keys, API keys, tokens, passwords, database URLsskip for private keys, redact for everything else
MEDIUMPotentially identifying information — IP addresses, email addressesredact for IPs, include for emails
LOWMachine-specific paths — home directory with usernameinclude

Actions

ActionWhat Happens
skipEntire file is dropped from output — not included in report or backup
redactMatched values are replaced with [REDACTED] — file structure preserved
includeFile included as-is — finding is logged in the sensitivity report

The highest-severity action wins per file. If a file has both a skip finding (private key) and a redact finding (auth token), the file is skipped entirely.

Complete Pattern Reference

HIGH Severity — Private Keys (action: skip)

Pattern IDLabelRegexExample Match
private-key-pemprivate key/-----BEGIN.*PRIVATE KEY-----/-----BEGIN RSA PRIVATE KEY-----
pgp-private-keyPGP private key/-----BEGIN PGP PRIVATE KEY BLOCK-----/PGP private key block header

These patterns cause the entire file to be skipped — private key files should never appear in reports or backups.

HIGH Severity — Generic Secrets (action: redact)

Pattern IDLabelRegexExample Match
generic-secretsecret value/(PASSWORD|SECRET_KEY|API_SECRET|PRIVATE_KEY|AUTH_TOKEN|ACCESS_TOKEN|SECRET)\s*[=:]\s*\S+/iAPP_SECRET=my-secret-value
generic-api-keyAPI key/(API_KEY|APIKEY)\s*[=:]\s*\S+/iSOME_API_KEY=abc123def456

No leading \b

These patterns intentionally omit \b before the keyword because _ is a word character — APP_SECRET would not match \bSECRET since the boundary falls between _ and S.

HIGH Severity — Auth Tokens (action: redact)

Pattern IDLabelRegexExample Match
auth-token-npmnpm auth token/_authToken=.+/_authToken=secret-token-123
bearer-tokenbearer token/Bearer\s+[A-Za-z0-9\-._~+/]+=*/Authorization: Bearer eyJhbG...
github-tokenGitHub token/\b(ghp_|gho_|ghu_|ghs_|github_pat_)...\b/ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZab...
npm-tokennpm token/\bnpm_[A-Za-z0-9]{36,}\b/npm_aBcDeFgHiJkLmNoPqRsTuVwXyZ

HIGH Severity — AI Provider Keys (action: redact)

Pattern IDLabelRegexExample Match
openai-keyOpenAI key/\bsk-(proj-)?[A-Za-z0-9]{20,}\b/sk-proj-1234567890abcdefghij...
anthropic-keyAnthropic key/\bsk-ant-[A-Za-z0-9\-]{20,}\b/sk-ant-api03-abcdefghijklmno...

HIGH Severity — Cloud Provider Keys (action: redact)

Pattern IDLabelRegexExample Match
aws-access-keyAWS access key/\bAKIA[0-9A-Z]{16}\b/AKIAIOSFODNN7EXAMPLE
aws-secret-keyAWS secret key/aws_secret_access_key\s*=\s*.+/iaws_secret_access_key = wJalrXU...
google-api-keyGoogle API key/\bAIza[A-Za-z0-9\-_]{35}\b/AIzaSyBcDeFgHiJkLmNoPqRsTuV...
google-oauth-tokenGoogle OAuth token/\bya29\.[A-Za-z0-9\-_]+\b/ya29.a0AfH6SMBx...
firebase-keyFirebase key/\bAAAA[A-Za-z0-9\-_:]{100,}\b/Firebase server key
cloudflare-tokenCloudflare token/\bv1\.0-[A-Fa-f0-9]{24,}\b/Cloudflare API token

HIGH Severity — Payment & SaaS Keys (action: redact)

Pattern IDLabelRegexExample Match
stripe-keyStripe key/\b(sk_live_|sk_test_|pk_live_|pk_test_|rk_live_|rk_test_)...\b/sk_live_abcdefghijklmnopqrstuv
mapbox-tokenMapbox token/\b(pk|sk)\.eyJ...\b/pk.eyJhbGciOi.abcdef123456
twilio-keyTwilio key/\bSK[0-9a-fA-F]{32}\b/Twilio API key
sendgrid-keySendGrid key/\bSG\.[A-Za-z0-9\-_]{22,}\.[A-Za-z0-9\-_]{22,}\b/SG.abcdef...wxyz123...

HIGH Severity — Messaging Platforms (action: redact)

Pattern IDLabelRegexExample Match
slack-tokenSlack token/\b(xoxb|xoxp|xoxs|xoxa|xoxr)-...\b/xoxb-123456789012-abcdefghij
discord-tokenDiscord token/\b[MN][A-Za-z0-9]{23,}\.…\b/Discord bot token

HIGH Severity — Database & Infrastructure (action: redact)

Pattern IDLabelRegexExample Match
database-urldatabase connection string/\b(postgres|postgresql|mysql|mongodb|mongodb\+srv|redis|rediss):\/\/...\b/ipostgres://user:pass@host:5432/db
supabase-keySupabase key/\bsbp_[A-Za-z0-9]{40,}\b/Supabase project API key
vercel-tokenVercel token/\b(vc_prod_|vc_test_)[A-Za-z0-9]{20,}\b/Vercel deployment token
jwt-tokenJWT token/\beyJhbGciOi...\b/eyJhbGciOiJIUzI1NiIs... (3-part base64)

MEDIUM Severity

Pattern IDLabelRegexDefault ActionExample
ip-addressIP address/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/redact192.168.1.100
email-addressemail address/\b[\w.+-]+@[\w-]+\.[\w.]+\b/includedev@example.com

LOW Severity

Pattern IDLabelRegexDefault ActionExample
home-pathhome directory path/(Users|home)/<username>/include/Users/dogu/...

Dynamic Pattern

The home-path pattern is generated at runtime using os.userInfo().username. If the username can't be determined, this pattern is omitted.

Pattern Caching

Patterns are generated once and cached in memory (cachedPatterns). The username lookup for home-path happens only on first access.

Custom Redaction Functions

Some config entries have custom redaction that runs before pattern-based scanning:

EntryFunctionWhat It Does
SSH configredactSshConfig()Replaces HostName <value> lines with HostName [REDACTED]
npm configredactNpmTokens()Replaces _authToken=<value> with _authToken=[REDACTED]

Custom redaction preserves file structure while removing specific sensitive values. Pattern-based scanning then handles anything the custom function missed.

Match Truncation

All matches in scan findings are truncated to 40 characters. If the original match is longer, it's cut and ... is appended (total max: 43 chars). This prevents sensitive data from appearing in scan report output.

Sensitivity Report

After every collect and backup (with redaction enabled), a summary is printed:

⚠ Sensitivity report:
  HIGH   ~/.ssh/id_ed25519         private key — skipped
  HIGH   ~/.npmrc                  auth token — redacted
  MEDIUM ~/.gitconfig              email address — included

  2 items redacted, 1 skipped. Use --no-redact to include all.

Each line shows:

  • Severity level of the top finding
  • File path (padded to 30 chars)
  • Pattern label and action taken

Files with no findings are omitted from the report.

Overriding Redaction

bash
dotfiles collect --no-redact    # Include everything
dotfiles backup --no-redact     # Backup without redaction

Use with caution

--no-redact disables all sensitivity handling. Private keys, API tokens, and database passwords will be included in plain text. Only use this when you fully control the storage destination (e.g., an encrypted, private repository).

How Redaction Works Internally

The applyRedactions() function:

  1. Takes file content and a ScanResult
  2. If the result action is not redact, returns content unchanged
  3. For each finding, replaces the matched text with [REDACTED] using the pattern's regex
  4. The REDACTION_MARKER constant ("[REDACTED]") is defined in src/utils/constants.ts and used consistently across the codebase

The restore system checks for REDACTION_MARKER in backup files — any file containing it gets redacted status and is automatically skipped during restore to prevent writing masked values to the machine.