Recommendations for improving the new instruct model?

I’ve been experimenting with GPT3.5 instruct vs. chat and noticed that the chat model is way better in following a required JSON output format. This surprises me a little. I would like to use the instruct model because it’s 2-3x faster and I often prefer it’s concise answering style. My experiment went something like this:

Chat:

  • Message 1: System - The actual instructions
  • Message 2: User - Example Input
  • Message 3: Assistant - Example Output
  • Message 4: User - New Input

The results we re spot on every single time.

Instruct (as a multi line string):
The actual instructions.
Input: Example Input
Output: Example Output
Input: New Input
Output:

This made many mistakes with the JSON format or continuing to produce predictions after the first one.

Any pointers how to improve the instruct model?

I would recommend you read the InstructGPT paper and see the kind of inputs that the model was made into “instruct” on.

(particularly, note that the example of “Marv” found on page 29 - the exact thing they show as a success in fine-tune examples is already exactly something that doesn’t need your fine-tune to perform)

You will see the tuning is basic, simple, one-shot. Chat examples are to pal around and make it to the second turn, not have extended problem-solving sessions.

We can see the similar behavior and limited repertoire in gpt-3.5-turbo-instruct.

For best success, I would treat it absolutely as a completion model, giving it the prompting to put it into a writing state, and then multishot up to what it takes to get an answer.

The model will take to few-shot training very quickly. If it “says” it is something in two turns, that is what it “is” almost permanently when included in context. That means you also must have a long multishot before you let it loose to chat, because otherwise it will overtrain itself on its own outputs to be a one task one answer machine.

Here’s a full prompt input to the AI model. “system” programming and 3-shot example output (including this very answer) before grabbing one of the newest questions from the forum to see how AI can answer it.

Summary

Host:

I’m _j, the host of AI forums, the place where experts go to discuss and learn about the latest developments and programming techniques in artificial intelligence. Thank you for purchasing my consultation package, including up to ten high-quality well articulated solutions to implementing your AI projects and solving your programming tasks.

User:

I’ve been experimenting with GPT3.5 instruct vs. chat and noticed that the chat model is way better in following a required JSON output format. This surprises me a little. I would like to use the instruct model because it’s 2-3x faster and I often prefer it’s concise answering style. My experiment went something like this:

Chat:
Message 1: System - The actual instructions
Message 2: User - Example Input
Message 3: Assistant - Example Output
Message 4: User - New Input

The results we re spot on every single time.

Instruct (as a multi line string):
The actual instructions.
Input: Example Input
Output: Example Output
Input: New Input
Output:

This made many mistakes with the JSON format or continuing to produce predictions after the first one.

Any pointers how to improve the instruct model?

Host:

I would recommend you read the InstructGPT paper and see the kind of inputs that the model was made into “instruct” on.

https://arxiv.org/pdf/2203.02155.pdf

(particularly, note that the example of “Marv” found on page 29 - the exact thing they show as a success in fine-tune examples is already exactly something that doesn’t need your fine-tune to perform)

You will see the tuning is basic, simple, one-shot. Chat examples are to pal around and make it to the second turn, not have extended problem-solving sessions.

We can see the similar behavior and limited repertoire in gpt-3.5-turbo-instruct.

For best success, I would treat it absolutely as a completion model, giving it the prompting to put it into a writing state, and then multishot up to what it takes to get an answer.

The model will take to few-shot training very quickly. If it “says” it is something in two turns, that is what it “is” almost permanently when included in context. That means you also must have a long multishot before you let it loose to chat, because otherwise it will overtrain itself on its own outputs to be a one task one answer machine.

User:

I’m part of two different organization accounts. One has hit rate limits, the other has not. I’ve tried to switch over to the other organisation, generate a new API Key and run with that instead. Despite doing this and changing the default API organization,

At first I was getting this error returned: You’ve exceeded your current monthly quota on OpenAI’s API.

I am getting now getting this error message for the same API key:

OpenAI’s API returned: "Incorrect API key provided: sk-ndSgX***************************************u1Cm.

I have tried creating and using additional secret keys, all of which authenticate successfully but I still get this response.

Host:

You can include the organization ID as part of the API request to bill directly for every request. That will override the “default” and make clear where the key usage is being billed.

Then you just must be a member of that organization to use it.

User:

Hi there

I use gpt3.5 turbo. Temperature=0
I am analyzing transcript of YouTube videos
My prompt basically says that OpenAI should analyze the transcript and see if a given topic (a user input) is mentioned or not. If it’s mentioned, it should return its sentiment (positive, negative or neutral) and why the sentiment was chosen in a few words. If it’s not mentioned it should return not mentioned

Yet after numerous attempts at prompting, I keep getting randomly some output such as:
Topic : Negative - the topic is not mentioned in the transcript.

Instead I would expect :
Topic : Not mentioned
And this is actually what I get in most cases.
But randomly I get the Negative sentiment associated with a not mentioned reasoning

(In the example above topic is in reality replaced by the user input)

I am starting to consider to make another call to OpenAI to simply refine the initial answer so that it cleans up the inconsistencies.

Any thoughts?

Host:

This works for a structured output:

system

You are Extracto, a YouTube transcript analyzer. User provides their topic to search for, and a transcript from a video, and you choose and produce this binary json response, deciding if the topic is found or not, and if so, providing a sentiment analysis of transcript:

{“topic”: string,
“topic_found”: boolean,
“sentiment”: string,
“sentiment_reasoning”: string,
}

sentiment options:
positive, neutral, negative, n/a

sentiment is only measured if the topic is found, otherwise n/a

However, gpt-3.5-turbo is so crippled that it says false for the topic “healthcare” with a transcript including *“[0:15]*James Tyree: Hello and welcome to the TSET Better Health Podcast! This is James Tyree, health communication consultant” I added more prompt (seen above), and for reliability, one has to go to gpt-3.5-turbo-0301 or gpt-4 for a bot that can still think:

{“topic”: “Healthcare”,
“topic_found”: True,
“sentiment”: “positive”,
“sentiment_reasoning”: "The transcript discusses the latest research on the appeal

User:

Any tools out there to pull/scrape complete website data and feed it into GPT?
I am trying to convert a website into chatbot version with FAQs. I want to extract text from all the site links and then auto-categorize and feed into GPT. Anyone out there who has done this?

Host:

So the “chat” question inserted into a completion:

Any tools out there to pull/scrape complete website data and feed it into GPT?
I am trying to convert a website into chatbot version with FAQs. I want to extract text from all the site links and then auto-categorize and feed into GPT. Anyone out there who has done this?

And the supervised instruct answer trained on writing as me:

Yes, there are tools available to scrape website data and feed it into GPT. Some popular options include BeautifulSoup, Scrapy, and Selenium. These tools allow you to extract text from websites and organize it into a format that can be easily fed into GPT for training or use in a chatbot. You may also want to consider using a web crawler to automatically navigate through the website and gather data from multiple pages. However, be sure to check the website’s terms of service before scraping any data, as some sites may prohibit this practice.

I’ll go paste it there!

Thank you very much for your detailed answer! Highly appreciated :slight_smile:

This really did solve the problem. I switched it to:

The actual instructions (including that it’s a conversation with an AI)

User: Example Input
AI: Example Output
User: New Input
AI:

The output is much more accurate now.