How does the tool de-duplication works?

I use sentence embeddings (I could use openai embeddings now actually) + GitHub - criteo/autofaiss: Automatically create Faiss knn indices with the most optimal similarity search parameters. + clustering to de-duplicate my datasets, but I’ve seen that openai tool seems to do it too, but I guess it’s rather some string heuristic such as Sørensen–Dice coefficient - Wikipedia no?