Attempting to reduce the vocabulary of gpt 3

wintercalter · January 29, 2024, 8:00pm

I got to develop a chatbot using GPT3 (gotta fine tune it and such) in to a llm that will be used for a chatbot. Now, it is to my understanding that the chatbot will be used for practicing conversations in different languages. Thus, I need to restrict its vocabulary to just use really basic words. My plan was to fine-tune the model so that it favors these words. However, since I would only be increasing the change of this words getting generated, I was wondering if there was an easier and more reliable way of achieving this.

iris3dgames · January 29, 2024, 8:06pm

Your own knowledge base in a pdf, and be very specific in the instructions you pass to the chat bot, specify in detail each step.

At startup you only verify your knowledge base called Language.pdf.

Inside this file you specify in detail what it has to do.

wintercalter · January 29, 2024, 8:20pm

Thing is that the list of accepted words surpasses the 1000 words, and just adding them to a prompt every time wouldn’t make it cost-effective.

_j · January 29, 2024, 8:48pm

Gotta fine tune, indeed.

The AI has been trained on massive knowledge. It doesn’t do a very good job of following instructions for either simple childrens’ writing or highly-technical writing.

It would take a massive training set to emphasize just a subset of simple words.

You also would have to train in the context of how the AI responds simply to given user inputs of your application.

The best way would be to begin with a base model … which is teaching the AI how to “chat” all over again, as that comes with nothing you might expect for chatting skills seen in ChatGPT models.

vb · January 29, 2024, 9:33pm

Have you tried some basic prompting techniques like “reply with the vocabulary of a 5 year old”?
If you can identify a role model and prompt the model to follow the patterns you may have a fast and easy solution at hand.

PaulBellow · January 29, 2024, 9:51pm

It’s possible to change the vocabulary and writing style with just a prompt, but it takes a bit of work. Mine still isn’t 100% reliable… Second image is Easy to Read, I believe…

vb · January 29, 2024, 10:12pm

@PaulBellow Since fine tuning is an option, didn’t we have a post from some experienced authors how to go about it?

The combination of both should improve the results even further.

PaulBellow · January 29, 2024, 10:18pm

Hrm. Might be? I don’t recall off-hand, but search might bring them up.

I just wanted to mention that it can be done without fine-tuning if you’re careful with prompting, etc.

vb · January 29, 2024, 10:21pm

Found it:

I just remembered you shared it earlier.

PaulBellow · January 29, 2024, 10:22pm

Ah, gotcha. Thanks. My brain is so fried lol…

That was indeed the post… and a good one from an author.

anon10827405 · January 29, 2024, 11:06pm

Hey! I do the same thing with a Whatsapp Client that helps Spanish people learn English!

There’s 2 rules:

The model understands CEFR. You can set an instruction not only at the level, but have the instruction itself written at the level.
ChatGPT
The model mirrors the user. If they speak at a level of C2 but you are asking for A1 it may deviate.

eawestwrites · February 2, 2024, 9:31am

Hey @vb and @PaulBellow , I’ve been meaning to write an update to what we’ve been doing over at Future Fiction Academy. We have multiple authors with 3.5 16k fine tunes that write outputs that are indistinguishable from human writing.

Fine tunes do appear to put more attention on the tokens in the dataset. For example, have a character say “Dag nab it” and suddenly you have a higher probability that phrase will be in your output.

For the original problem of make gpt 3 write simpler, like a child:

Have a system prompt explaining what you want. Then prompts in user that match how you plan to prompt it in the future. (Note to marry the system prompt in the dataset and the user prompts later sans system prompt, make a persona like SpeakSimplySimon and then in user prompts say “Be SpeakSimplySimon” and give it the task and parameters.

If your Assistant responses are all examples of the simpler writing you want, that’s how that fine tune will respond.

Start with just 10-20 samples and see how it moves the needle. Make sure you see it making a difference because you might need to tweak the prompting and control length with maximum token output vs it naturally keeping to short responses.

HTH

PaulBellow · February 2, 2024, 11:44am

Hey, thanks for stopping by to check in.

Sounds like y’all are doing great things with fine-tuning.

Do you have a link for myself or anyone interested?

Topic		Replies	Views
Can I train it to write like me? How? API	16	6932	June 2, 2024
Train a GPT model in my tone API	30	12468	December 17, 2023
What to do when fine-tuning is not working? API	21	8098	December 24, 2023
Fine Tuning ChatGPT with large text from Books Prompting	18	11533	March 26, 2024
Share: Fine-Tune GPT 3.5 16k Results Only 10 Examples Novel Outlines API fine-tuning , api , tp-1 , authors	24	4008	February 4, 2024

Attempting to reduce the vocabulary of gpt 3

Related topics