Fine tuning vs. multishot questions

With the messages thing I can hand the bot several lines of the conversion before the conversation begins. (I think they call this a multi-shot question) so, taken to the extreme does that mean that if I give the bot hundreds of lines of Q&A on one topic, (and then I assume I have tell the bot that the user doesn’t know any of that) then would I have an expert on my topic? Who would start from scratch and patently explain it all to the user?

and if that does work, how is that different from “fine tuning.”

I can easily imagine that “fine-tuning” is simply a way of saving all the tokens of this massive multi-shot question every time, or does “fine-tuning” actually retrain and re-weight my own copy of the NN?

Yes, fine tuning does change the weights of the underlying model using supervised learning. Probably just the last layer (as a standard fine tuning mechanism), but OpenAI has not made the details of their fine tuning process public.

The other procedure is usually called in-context learning, as the model learns how to generate in-domain completions via prompting only (context). There’s no model change on this regard: you are working with the exact same base model as everybody else.

Hope that helps!

1 Like

Ah, just the last layer, that make sense. And “in context learning” thanks for that.

and is there “start from scratch” command that separates the “in context learning” from the live conversation?

There is no a specific command to distinguish the “knowledge” from the actual conversation, but you can try different formulas and see which one works best for you :slight_smile:. Some examples:

  • Incorporate all the domain knowledge in the system message and reserve the user and assistant messages for the QA only.
  • Incorporate the knowledge in the first user message and then specific clearly where does the conversation start with something like “Now, let’s move on to the actual QA conversation. First question: “…

In addition: you might also want to consider exploring semantic search as the context length limitation won’t allow you to include arbitrarily large amount of knowledge in just one prompt.

Just tell it to interpret some uncommon token as ending the context and beginning the chat. I usually do it in a slightly less efficient but more readable way, like this

#!# Begin Context #!# 
<.... put context ...>
#!# End Context #!#

Instructions: <through to end of line, dont include newlines here>

Prompt: < now put the initial prompt here, and go to user input loop >

The placement of the newlines in there seems to help (like having the #!# … #!# on their own line and separating messages from different “users” with two of them.

Edit: also, keeping related ideas in the same message helps (really it’s about proximity in the character stream with divider token sets adding more distance), and going back to edit previous messages and rebranching the conversation tree are important for developing prompts for this kind of task, you’ll get completely different results if you try to teach it over a long chat, and it will have forgotten the beginning by the time you get to the end.

Thanks This is very helpful.
So context info up to about 3/4 of my available tokens, beyond that, fine tuning.

Given that the context learning info has to be in Q&A form, I figured I had to use user/assistant messages for that. (and I’d be building the fine tuning dataset at the same time)

And was guessing that I would use a system message along the lines of “Now you have a new User, who does not know any of the previous information.” as the break point.

(boy its fun being back on the bleeding edge of compsci!)
Thanks again.

1 Like

Just dont forget that fine tuning reduces the flexibility of the model drastically.

You can give it context in jsonl like this:

#!# Begin Context (Jsonl) #!#
{asker: <nick1>, answerer: <nick2>, q: "q text here", a: "a text here"},
#!# End context #!#

The length of the line doesnt matter, just the number of tokens.

I wouldn’t feed the context in as separate chats to the chat completion endpoint, there’s spacer and formatting prompting going on that we don’t have a clear view into between those messages, it’s not just feeding them into the model directly as a character stream i dont think. If you want to ask a question or do some kind of analysis on a bunch of messages all at once, you might be better off including them within one prompt to the completions endpoint (or as part of the system prompt in a length 2-3 chat completion chain). But thats purely intuitive speculation having tried many different ways to do things like analyze previous chatgpt conversation logs, it’s possible that if the logs were between 2 or more human participants things would go differently.

#!# User <nick> Enters; <nick> has no context #!# 

may work, since the context token is already understood as background information, and that emulates existing chatroom logs the bot has already consumed.

Hell yeah, same here, i thought I’d missed out.

Thanks for your help, but I’m already done.
Its programmed, responding as desired and on the web!
GPT-4 wrote a good chunk of the .js code to get the API on the web page for me!