I have hundreds of narratives (short paragraphs) that I want to tag with one or multiple words.
Even when using the gpt-3.5-turbo-16k, the context window is still not large enough to fit 50-100 pages of narratives.
All narratives need to be submitted before generating the tags because some tags may apply to multiple narratives (full context need to be provided in my case).
I’m using the chat completion api endpoint.
https://api.openai.com/v1/chat/completions
I’m using node js. Here is the backend prompt:
const { data } = await openAiApi.post(`/chat/completions`, {
model: "gpt-3.5-turbo-16k",
messages: [
{
role: "system",
content:
"You are an experienced qualitative researcher and text analyst. You are working on a project where you need to analyze a large number of short narratives.",
},
{ role: "user", content: content },
],
temperature: 0.4,
});
The content of the user message is:
const content = `
You will be provided with short narratives. Create qualitative tags for each narrative. The tags should help the researcher organize and combine narratives based on common themes. The goal is to to group narratives that have a common theme.
Provide your output in json format (array of objects) in the following format:
[
{
"label": "<insert tag label here>",
"comment": "<insert comment here>",
"color": "<insert color here>",
"narrativesIds": ["<id1>", "<id2>", "<id3>"]
}
]
Where label is the tag label you come up with, comment is a short description of the label, narrativesIds are the ids of the narratives that this label applies to.
For the color field, choose a random value from the following list: purple, teal, orange, amber, brown, grey, deepOrange, blueGrey, yellow, red, pink, cyan, lightBlue, green, indigo, blue, lime.
The narratives are:
${narratives.join("\n\n")}
Output a JSON of all the tags for all narratives. Remember, the same tag could be used for multiple narratives. The tags should be unique and not repeated. Only return the list of tags as JSON with no other words.
`;
I want to output the response as JSON in order to persist it in a database.
Sometimes I have 200 hundred narratives which exceeds the context window. What is the best way to provide the model with ALL narratives (hundreds)? Is there a way to maintain previous context and submit chunks of narratives one at a time? Any help is appreciated!