Does Hashing Tokens Provide Privacy in LLM Training?

:thinking: Question: Is Hashing Still Useful in LLM Training if the Model Can Just Learn the Patterns?

Hey everyone,

I’ve been experimenting with fine-tuning a language model and had a question I couldn’t stop thinking about.

Let’s say I preprocess my dataset by hashing certain words — either for privacy (like names or places) or just to obfuscate common tokens. So instead of training on the word itself, the model sees its hashed version (e.g., hash("John")). The idea is that the model shouldn’t know what the original word was, right?

But then I thought — if that hashed token shows up in similar contexts often enough, wouldn’t the model just learn what it means anyway? Like, even though it’s hashed, it becomes just another token that gets mapped to a concept — kind of defeating the purpose of hashing in the first place.


So I’m wondering:

  • Does hashing tokens actually protect anything when training large models — or is it just as learnable as regular words, given enough examples?
  • Would using something like salted hashes help (where every instance is different), or would that just introduce noise?
  • Is hashing more useful at inference time only, rather than during training?
  • Has anyone tried this and found it helpful (or not)?

I’m asking this in the context of a RAG-based system I’m working on, where we’re trying to protect semi-private info in the training data. I know there are more advanced approaches (like differential privacy), but I’m curious if anyone here has explored simple hashing strategies and what came out of it.

Would love to hear from anyone who’s tested this or just has thoughts on the idea! :raising_hands:

Thanks!

Why not just anonymize the data using random substitutes?

Anonymizing data with random substitutes or hashes isn’t enough—context can still reveal identities, meaning can break in sensitive areas, and models may learn patterns or reverse hashes, risking re-identification.