Client-side secret redaction for LLM prompts (LeakGuard MVP)

I’ve been working on a Chrome extension that acts as a client-side privacy layer for LLM usage.

The idea:
Detect likely secrets in the prompt before it’s sent, replace them with local placeholders (e.g. [PWM_1]), and ensure only redacted data leaves the browser.

What’s currently working:

  • deterministic mapping (same secret → same placeholder)

  • idempotent behavior (already-redacted input stays unchanged)

  • mixed input handling (raw + placeholder in same prompt)

  • detection of common patterns (API keys, tokens, JWTs, connection strings, etc.)

  • verified via DevTools that outbound payloads contain only placeholders

This is not meant to be “perfect security,” but a safety layer to reduce accidental leakage during day-to-day LLM usage.

What I’m looking for:

  • where would you try to break this?

  • what edge cases am I missing?

  • how would you approach unknown secret detection (entropy vs context)?

Repo: you can find it in github with name petritbahtiri123/LeakGuard