Suggestions on how to implement prompts dependent on previous answers

Hi, I’m new and I’m learning about AI.
I’ll explain my project so that you can help me: the ultimate goal is to obtain a marketing document from the responses generated by the AI. I know how to generate the document, my problem is the generation of the prompts: the context and part of the information are taken from the forms and some information must be obtained by the AI starting from this information. So some prompts will have information from these questionnaires as input and others will refer to previous prompts.
For example: from the forms I discover that the company operates in X sector and therefore I ask the AI to tell me what the possible competitors might be. From the answer, I can make another prompt and ask how the company differs from competitors and so on.
I know that using the API I can’t have a history of the previous prompts, so I was thinking of concatenating the previous answers to the next prompt, but I’m afraid of running out of tokens.
I’ve read both on this forum and online various explanations of fine-tuning and embedding but I’m still confused about them. I don’t even know if they can be useful in my situation, since I don’t have all the context information a priori.
I also specify that I use PHP.
What do you advise me to do? is it better to use fine-tuning or embedding or neither?
I was thinking of using Completions, specifically text-davinci-003, but I also know that both fine-tuning and embedding are not available for all models, so if you have any suggestions on which model to use, that would be great.

I hope I made myself clear and thanks in advance for any advice!

I suggest you begin with a diagram of the workflow and structure.

When it is clear what needs to happen then it will be possible to find the best way to achieve the goal.

Thank you, but I don’t think I explained myself well: in summary I wanted to know if, in addition to concatenating the previous answer to the prompt to get the context (it’s a test I’ve already done and it works, but I easily exceed the token limit) if it was possible to use embedding or the fine- tuning although I don’t know the data a priori (as I said, I don’t quite understand how they work and how to use them).
So I already have a “solution”, but I wanted to know if there were more efficient methods that use less tokens.

Hi and welcome to the forum!

You could indeed embed the text and perform a search on that embedded data for relevant context regarding your new query and then embed the result of that and so on, however, you can also just throw the older chat information away when you reach the context limit, that is how ChatGPT does it, it leaves enough room for a typical response and then cuts off any old context over the token limit, works surprisingly well.

Utilizing extraction prompts works far better than embeddings/vector search, is simpler, and the results can be stored in variables to use further along your workflow.

Fine tuning is for creating specialist models for specific tasks - which are faster, cheaper to run, and can provide better quality outputs, than raw models - IF trained correctly on a high quality dataset.
Far better to get your workflow prototype functioning utilizing raw models before considering fine tuning.

If you have a process flow diagram you can share it for more detailed advice on the best approach for your specific use case.

Thanks everyone for the quick replies. So in my case, trying to use embedding or fine-tuning would be a waste of time

I would suggest you get the workflow running with raw models.

You can then consider embeddings/fine tuning for specific aspects of your system if they will improve speed/improve quality/reduce cost.

Get it working first :slight_smile:

You can, thats what the messages array is for. but it sounds like you re-impleneted it yourself without realizing it :joy:
but of course you have to persist the data yourself between calls, and rebuild the messages array every call

How about before you append the previous result, summarize it and limit the summary to some length?

I used Create completion ( not Chat Completion so I don’t have the message array.
But perhaps the idea of using Chat completion isn’t bad… in addition to the array you indicated, I can specify the role of the system in order to obtain more adequate answers. Thanks for the suggestion!

But I have a doubt: aren’t the components of the message array also counted as tokens?

I’ve thought about it, but I’m afraid that with the summary important details will be lost. Anyway I will try the same, thanks!

Yes, sadly. But they’re billed at the same rate as Completion tokens (I think). Plus, Completion service is marked as legacy, so if you build against it you might have to redo things to use Chat if/when they sunset Competion anyway.

Hi, is there a thread where I would talk about all the promises that exist for GPT ? Because I can’t find it.