I don’t have very clear which endpoint I should be using.
Let’s say I want to pass a whole text and I need a summary, or I need GPT to select adjectives or grammar mistakes.
Should I be using /v1/completions or /v1/chat/completions?
The advantage I see when using chat/completions is that I can send the main prompt with the instruction once, then keep on appending the text until I reach the max tokens.
So i can actually save a bit of $ in there.
Appart from that, I see that chat/completions include GPT models while /v1/completions only text-* ones, like text-davinci-003.
GPT is generally more powerful and the recommended starting point for most tasks, but best to try them both with your usecase and see which is better. GPT3 is also way cheaper than davinci.
You don’t actually save any money. If the context is included in inference by any means, then you pay for the tokens.
Won’t’ the chat remember the first conversation where I send the instructions, and then I just append the text in groups to avoid reaching the max tokens?
If I use v1/completions I have to send the instructions on every single request. Shouldn’t I?
System prompt will contain the main instruction for the session.
User will submit whatever they want like post a block of text and ask it to summarize, then later on the conversation, ask the AI to perform several tasks against it.
However, I might put the passing of block of text in a different UI from main user chat interface. I will do this because I will append it to the system prompt so that all through out the session, you can refer to it.
Let say your UI has TextArea for the block of text and an Input text box for chat.
So you just append to your system prompt whatever is in the TextArea.
const messages = [
{ role: 'system', content: `You are a helpful assistant. Refer to the block of given text below:\n${textarea.value}` },
{ role: 'user', content: 'Please summarize the given text' }
]
...
Ah yeah, but that still means I have to append the same prompt with the instruction on every API call, so i wouldn’t be saving tokens / $ on it, right?
Yes, you need to attach it to maintain context. I do not see any way around it. For Chat API, the input prompt price is $0.0015/1K tokens ($0.002 for output) so cheaper. Text Completion using Davinci I think is $0.02/1K tokens.
The API has no state/memory/concept of threads or anything. If you want this behavior you need to send the previous messages (including system) on each request (and pay for all the tokens each time)