Is there any way to minimise the cost of a lengthy, but often-used, prompt?

daweimau · March 8, 2024, 2:37am

We have a workflow which works well with chat completions:

Give the system instruction (1600 tokens, constant)
Give the user input (~300 tokens, variable)
Consume GPT response (~5 tokens, variable)

Since the instruction is always the same, we’d rather not pay full rate for all those tokens, if we can avoid it.

With the chat completions API, we don’t seem to have any way around it.

With the assistants API:

We can pre-load the instruction, but it seems to be billed at full rate within the context of the thread
We can load the instruction as a file to be retrieved. But (although the docs don’t seem to mention this), from testing, the retrieval actually counts as tokens, seemingly even more tokens than simple instructions.

So is there any way we can be more efficient than repeating our instruction for every single request?

anon22939549 · March 8, 2024, 2:46am

Fine-tuning.

daweimau · March 8, 2024, 2:52am

Yes, some descriptions of fine-tuning seem to suggest it’s the answer.

However, all the fine-tuning documentation seems to describe providing datasets, being collections of examples of how the completion should respond to different inputs.

I haven’t seen any documentation or examples which show that the first instruction won’t again be required on subsequent runs. The docs seem to only talk about refining the response to a given instruction - not eliminating the need for the instruction.

Are there examples you can share which sound like our use case?

PaulBellow · March 8, 2024, 2:53am

Right, but you’d make the system prompt a LOT shorter in the dataset as you “control” the output as you’re the one constructing the dataset…

daweimau · March 8, 2024, 3:05am

I’m not so sure (yet) that fine-tuning really answers it.

For example, here is a dummy training dataset from the docs:

{"messages": [{"role": "system", "content": "Given a sports headline, provide the following fields in a JSON dict, where applicable: \"player\" (full name), \"team\", \"sport\", and \"gender\"."}, {"role": "user", "content": "Sources: Colts grant RB Taylor OK to seek trade"}, {"role": "assistant", "content": "{\"player\": \"Jonathan Taylor\", \"team\": \"Colts\", \"sport\": \"football\", \"gender\": \"male\" }"}]}
{"messages": [{"role": "system", "content": "Given a sports headline, provide the following fields in a JSON dict, where applicable: \"player\" (full name), \"team\", \"sport\", and \"gender\"."}, {"role": "user", "content": "OSU 'split down middle' on starting QB battle"}, {"role": "assistant", "content": "{\"player\": null, \"team\": \"OSU\", \"sport\": \"football\", \"gender\": null }"}]}

Then, the example usage of the resulting model:

completion = client.chat.completions.create(
  model="ft:gpt-3.5-turbo:my-org:custom_suffix:id",
  messages=[
    {"role": "system", "content": "Given a sports headline, provide the following fields in a JSON dict, where applicable: player (full name), team, sport, and gender"},
    {"role": "user", "content": "Richardson wins 100m at worlds to cap comeback"}
  ]
)

print(completion.choices[0].message)

In this example, the system prompt is repeated entirely, which is exactly what we’re trying to avoid. We aren’t including examples in our prompt, we’re happy with the outputs without providing examples

PaulBellow · March 8, 2024, 3:09am

Can you share your system prompt?

The idea would be to trim it down enough so you save tokens but it still makes sense. Because you’re providing the “answer” it gives in the dataset, you won’t need the “full” prompt you’re using now… likely…

daweimau · March 8, 2024, 3:19am

Here is the meat of it. We append the “categories” at the bottom.

(Because this is a classification use case, we have also explored using embeddings. But the data seems far too noisy for any accuracy with just embeddings. GPT can discern the context much better).

`You are a classification API. You categorise customer support enquiries.

Each customer enquiry may come via SMS or via email. The enquiry may contain an email thread and/or a subject line.,
DO NOT place ANY weight on the email thread or subject line unless the customer's writing does not clearly match any of the predefined categories.
This is because reply threads and subject lines can be very misleading.
ALWAYS focus most carefully on the customer's actual enquiry.
ONLY refer to the email thread context or subject line when the customer's writing is ambiguous or unclear.

Use the below taxonomy for classification. When given input text (the customer enquiry), you only ever respond with an array of 'category_label' values from this taxonomy, reflecting the most appropriate categories for that customer enquiry.
One or more categories may be appropriate for a single enquiry.
Your response should take this EXACT JSON format: { "123": ["classification_1", "classification_2", ...] }  replacing "123" with the ticket_id number, and populating the array with the different classifications. 
If there doesn't seem to be any appropriate category for this enquiry, respond like: { 123: ["OTHER"] }.
Do not assume that every enquiry falls within a taxonomy category. Many enquiries are properly categorised as "OTHER".

label | description
-------------------
${taxonomy.map((t) => ${t.label} | ${t.description}).join('\n')}`

jr.2509 · March 8, 2024, 4:02am

Hi - This may look like a drastic change to you but I am pretty sure that you can narrow it down to sth along those lines. I’ve fine-tuned models for classification (multi- and single classification) quite a bit and in my experience this should suffice. “Other” should just form part of your list of pre-defined classification - the model will pick up the pattern for when to classify it as “Other” during the training (provided you include examples with “Other” in the training set.

`You are a classification API. You categorise customer support enquiries into one or multiple pre-defined categories in accordance with the taxonomy provided. Your response should take this EXACT JSON format: { “123”: [“classification_1”, “classification_2”, …] } replacing “123” with the ticket_id number, and populating the array with the different classifications.

Then of course add your taxonomy etc.

stevenic · March 8, 2024, 4:04am

I would say that most of the text in your prompt isn’t needed for the categorization task you’re asking the model to perform.

Your prompt is describing in exacting detail the patterns it should look for in the input text and if you were talking to a human that’s exactly what you’d want to do. But you’re talking to an LLM not a human.

These models are excellent pattern matchers so all you need to do is show them a diverse cross section of examples and they’ll find the pattern. They’re way better at finding patterns then they are at following instructions.

Topic		Replies	Views
Finetuning for shortening prompts Documentation fine-tuning	10	3617	December 24, 2023
System prompt on finetuning API	6	2398	November 6, 2024
How should I organize my prompts? Prompting api	4	3155	December 9, 2023
Pseudo fine-tuning chat completions... best practices? Prompting gpt-4	4	955	December 24, 2023
Single pre-prompt with multiple prompts for classification task API	3	2999	February 26, 2024

Is there any way to minimise the cost of a lengthy, but often-used, prompt?

Related topics