How does GPT-3 handle Synonyms


I’m only a few days into beta so apologies if I’ve missed a resource I should have checked.
Is GPT-3 designed to handle synonyms? The example is… if I use the playground to lookup:
I get back:
ReactJS, AngularJS, node.js, and many more.

If I try:
I get back:
React.JS, Angular.JS, node.js, and many more.

It seems it treats React.JS, ReactJS, AngularJS and Angular.JS as individual terms.

Welcome to the community, Atul! To make sure I understand your question - could you share more about your use case and the kind of output you’re hoping to receive from the API?

@raf my expectation from playing around with GPT-3 words/tokens such as And, AND, and and are equivalent.

I was hoping that tokens/words such as ReactJS React.JS would be rationalised into the most common term.

The use case is analysing technical engineering skills.

Hope this makes sense?

You can access the tokenizer here OpenAI API if you want to see how things get broken down.

1 Like

“ReactJS” and “React.JS” are translated into different arrays of tokens: [3041, 529, 20120] and [3041, 529, 13, 20120], respectively. I guess that if both spellings are commonly used in the wild, choosing one over the other will not make a massive difference in your app. That said, these are different word vectors and will not produce identical completions, even with temperature=0.

Hope this helps! Please let us know if you have more questions on this.