Is there any way to minimise the cost of a lengthy, but often-used, prompt?

We have a workflow which works well with chat completions:

  • Give the system instruction (1600 tokens, constant)
  • Give the user input (~300 tokens, variable)
  • Consume GPT response (~5 tokens, variable)

Since the instruction is always the same, we’d rather not pay full rate for all those tokens, if we can avoid it.

With the chat completions API, we don’t seem to have any way around it.

With the assistants API:

  • We can pre-load the instruction, but it seems to be billed at full rate within the context of the thread
  • We can load the instruction as a file to be retrieved. But (although the docs don’t seem to mention this), from testing, the retrieval actually counts as tokens, seemingly even more tokens than simple instructions.

So is there any way we can be more efficient than repeating our instruction for every single request?

1 Like

Fine-tuning.

3 Likes

Yes, some descriptions of fine-tuning seem to suggest it’s the answer.

However, all the fine-tuning documentation seems to describe providing datasets, being collections of examples of how the completion should respond to different inputs.

I haven’t seen any documentation or examples which show that the first instruction won’t again be required on subsequent runs. The docs seem to only talk about refining the response to a given instruction - not eliminating the need for the instruction.

Are there examples you can share which sound like our use case?

2 Likes

Right, but you’d make the system prompt a LOT shorter in the dataset as you “control” the output as you’re the one constructing the dataset…

1 Like

I’m not so sure (yet) that fine-tuning really answers it.

For example, here is a dummy training dataset from the docs:

{"messages": [{"role": "system", "content": "Given a sports headline, provide the following fields in a JSON dict, where applicable: \"player\" (full name), \"team\", \"sport\", and \"gender\"."}, {"role": "user", "content": "Sources: Colts grant RB Taylor OK to seek trade"}, {"role": "assistant", "content": "{\"player\": \"Jonathan Taylor\", \"team\": \"Colts\", \"sport\": \"football\", \"gender\": \"male\" }"}]}
{"messages": [{"role": "system", "content": "Given a sports headline, provide the following fields in a JSON dict, where applicable: \"player\" (full name), \"team\", \"sport\", and \"gender\"."}, {"role": "user", "content": "OSU 'split down middle' on starting QB battle"}, {"role": "assistant", "content": "{\"player\": null, \"team\": \"OSU\", \"sport\": \"football\", \"gender\": null }"}]}

Then, the example usage of the resulting model:

completion = client.chat.completions.create(
  model="ft:gpt-3.5-turbo:my-org:custom_suffix:id",
  messages=[
    {"role": "system", "content": "Given a sports headline, provide the following fields in a JSON dict, where applicable: player (full name), team, sport, and gender"},
    {"role": "user", "content": "Richardson wins 100m at worlds to cap comeback"}
  ]
)

print(completion.choices[0].message)

In this example, the system prompt is repeated entirely, which is exactly what we’re trying to avoid. We aren’t including examples in our prompt, we’re happy with the outputs without providing examples

1 Like

Can you share your system prompt?

The idea would be to trim it down enough so you save tokens but it still makes sense. Because you’re providing the “answer” it gives in the dataset, you won’t need the “full” prompt you’re using now… likely…

1 Like

Here is the meat of it. We append the “categories” at the bottom.

(Because this is a classification use case, we have also explored using embeddings. But the data seems far too noisy for any accuracy with just embeddings. GPT can discern the context much better).

`You are a classification API. You categorise customer support enquiries.

Each customer enquiry may come via SMS or via email. The enquiry may contain an email thread and/or a subject line.,
DO NOT place ANY weight on the email thread or subject line unless the customer's writing does not clearly match any of the predefined categories.
This is because reply threads and subject lines can be very misleading.
ALWAYS focus most carefully on the customer's actual enquiry.
ONLY refer to the email thread context or subject line when the customer's writing is ambiguous or unclear.

Use the below taxonomy for classification. When given input text (the customer enquiry), you only ever respond with an array of 'category_label' values from this taxonomy, reflecting the most appropriate categories for that customer enquiry.
One or more categories may be appropriate for a single enquiry.
Your response should take this EXACT JSON format: { "123": ["classification_1", "classification_2", ...] }  replacing "123" with the ticket_id number, and populating the array with the different classifications. 
If there doesn't seem to be any appropriate category for this enquiry, respond like: { 123: ["OTHER"] }.
Do not assume that every enquiry falls within a taxonomy category. Many enquiries are properly categorised as "OTHER".

label | description
-------------------
${taxonomy.map((t) => ${t.label} | ${t.description}).join('\n')}`

1 Like

Hi - This may look like a drastic change to you but I am pretty sure that you can narrow it down to sth along those lines. I’ve fine-tuned models for classification (multi- and single classification) quite a bit and in my experience this should suffice. “Other” should just form part of your list of pre-defined classification - the model will pick up the pattern for when to classify it as “Other” during the training (provided you include examples with “Other” in the training set.

`You are a classification API. You categorise customer support enquiries into one or multiple pre-defined categories in accordance with the taxonomy provided. Your response should take this EXACT JSON format: { “123”: [“classification_1”, “classification_2”, …] } replacing “123” with the ticket_id number, and populating the array with the different classifications.

Then of course add your taxonomy etc.

3 Likes

I would say that most of the text in your prompt isn’t needed for the categorization task you’re asking the model to perform.

Your prompt is describing in exacting detail the patterns it should look for in the input text and if you were talking to a human that’s exactly what you’d want to do. But you’re talking to an LLM not a human.

These models are excellent pattern matchers so all you need to do is show them a diverse cross section of examples and they’ll find the pattern. They’re way better at finding patterns then they are at following instructions.

3 Likes