Stopwords and training of current models

orlov · June 12, 2024, 8:44am

Does somebody have a source for me to clear the question:

are current models (current means not only by OpenAI) trained with or without stopwords?

_j · June 12, 2024, 10:13am

Chat models have stop sequences - just a single token used for the end of a message or end of a function that, if produced, terminates the output.

Otherwise the AI would produce text forever, because the whole way it works is one-directional iterative next-token generation.

orlov · June 12, 2024, 10:32am

I mean stopwords, as used for NLTP preparing: a, and, in…

_j · June 12, 2024, 3:50pm

In that language AI model are able to write English and others properly, from the corpus they were pretrained on, it would be apparent that particular parts of language were not removed.

orlov · June 12, 2024, 5:10pm

that was my question: are they indeed trained on data containing stopwords? Do you probably have a source for me?

maryam.farokhmehr · June 12, 2024, 11:20pm

Im not sure but i think you are refering to an old approach that we used to remove stop words. As far as i know we train the models with phrases that are the devisions of sentences into phrases . They might still have stop words in them. The raw data contains everything comtainkng stopwords

Topic		Replies	Views
Implementing stop words or phrases API	4	1755	October 23, 2023
How to stop getting replies with Ai sounded bording words? Prompting chatgpt , ai	3	374	August 27, 2024
How to stop a fine-tuned model from generating additional tokens? API	2	1543	February 23, 2022
Does OPEN AI API use API data for training? API	5	17581	February 28, 2024
How to do not show some words? API	6	1371	May 6, 2024

Stopwords and training of current models

Related topics