Which api to use in my case?

apris · September 27, 2024, 4:43am

Hello everyone ! My task is to turn a set of texts (subtitles from YouTube videos) into readable articles. Before processing, I need to send an instruction (about 5000 tokens) on how to work with them. After that, I will send the texts (subtitles) one by one. It is important that the LLM does not forget the instruction during the process. Which API would be best suited for this?

jochenschultz · September 27, 2024, 5:17am

You are expecting too much - very sorry to say that. But LLM are not on that level. Not even close.

I would even say that a 5000 token Instruction doesn’t have any sense either - there might be exceptions e.g. when you are using Multishot Prompting and add tons of examples. But expecting that any model would be capable of taking 5000 token long instruction and then follow them all is another level.

Think of a LLM like an intern that has a father who got a phd in everything.

You can give it some information and then it will try to remember as much as possible of that and ask the father and then come back with the answer.

jochenschultz · September 27, 2024, 5:30am

You could use multiple interns with smaller instruction.

Like one so called micro agent that searches for any type fruit in a sentence and then writes to a datastore:

sentence | fruit | and all the other results here…
this is a strawberry | yes

I know there are better data structures

icdev2dev · September 27, 2024, 6:05am

So I looked up this YouTube Video

had the following Youtube SubTitles

Monte Carlo Tree Search - MCTS in AI Agents
Strategist: Learning Strategic Skills
Strategy update process with Self-play and MCTS
MCTS explained step by step
Add an ethics filter to MCTS simulations

Generated the following content through selfet :

I use the AssistantAPI in selfet. (GitHub - icdev2dev/selfet)

hth

platypus · September 27, 2024, 7:46am

Hi @apris !

Regarding your original question: standard ChatCompletions API using a GPT-4o model should work fine. Provide the instructions in the system prompt, and the transcripts in the user prompt.

As others have pointed out, a very large instruction like yours may not be optimal. The largest system prompt I worked with that yielded acceptable results was around 1000 tokens, and that was for a large data extraction and formatting task, which I would argue is easier for an LLM. You may want to either simplify your instruction so it’s <=1000 tokens, or consider doing it in multiple stages (multiple sequential API calls), e.g.

Translate to <INSERT_LANGUAGE_HERE>
Remove any superfluous information (side comments / gestures, background noise, interruptions)
Find top-3 takeaways / themes
Combine the output from (3) and (2) to provide a summary
Use the output from (4) together with <INSERT_LANGUAGE_STYLE_INSTRUCTION_HERE> to tidy up the summary
…

Hope this helps!

apris · September 27, 2024, 9:52am

The same scenario works in web version (we checked it before). So maybe the level is ok, just need to find out the corresponding API?

jochenschultz · September 27, 2024, 9:53am

In web version you select the model

apris · September 27, 2024, 9:54am

Sorry, how exactly it answers my questions? Do I need to use assistants API or?

icdev2dev · September 27, 2024, 3:55pm

Your question was “which API to use in my case”.

What I have been advocating for (and the code is open source) is to use both the Assistant API and the Chat Completion API. The rationale, in my mind, is simple.

Use Assistants, Threads and Messages from Assistant API) as a persistent store of instructions and interactions with the LLMs. Use ChatCompletion to do text generation.

One of the benefit is much better control over the interaction (i.e. one can ignore certain messages and focus the context on things that matter).

Talking about context, I have not used 5000 tokens to produce instructions and it may be overkill for an advanced LLM like gpt-4o, gpt-o1. This is because these LLMs understand much more than having to feed detailed instructions. Of course, do your own experimentation,

hth

apris · September 28, 2024, 3:12pm

Great idea! Where can I find example of implementation? (as you mentioned, it is open source)?

sudirlay · September 28, 2024, 4:04pm

I’ll answer your question simple and straightforward, use the chat completions API and train a custom model from a base where you train it on most of the instructions you are sending so that when you initiate a completions session for the YouTube text, you are not sending all those instructions everytime with the text. This is what training your own model is for, if you train it well enough it should be at a point where you don’t have to send any instructions to it anymore, you are simply sending the YouTube text in your API calls to chat completions API with your custom model ID and it knows all the instructions you’ve trained it on to handle your operations without reminder.

icdev2dev · September 28, 2024, 4:33pm

hth

thinktank · September 28, 2024, 5:03pm

I also think we can all agree that they’re called Micro Agents and @stevenic coined the phrase. Hail steve.

Good rule of thumb. Thanks.

That’s a neat idea IC’d. But why not just use Assistants across the board since they’re basically Completions with more abilities?

There really is no way to “train your own model” right now. You can fine tune a model, but this is to help “fine tune” things like tone and diction, its not the same thing as training on a data set. @jr.2509 describes it well here, and recommends this article.

Finally, @apris I agree with ICD and Jochen that you should be looking at hyper specialized micro agents to individually work on your flow on single-minded tasks. Think of Henry Ford’s Assembly Line but for tireless AI Assistants.

icdev2dev · September 28, 2024, 5:14pm

My initial thing (started about 10 months ago) was that AssistantAPI (Assistants, Threads, Messages …) was in Beta. It is still in Beta. What I noticed the relative periodic runtime instability of the inferencing part of the AssistantAPI versus relative stability of the core data structures underlying the AssistantAPI (essentially Queues, Sets and Modifiable Sets). The ChatCompletion API had been stable for a long time and in GA.

From that initial thing, my thoughts have evolved quite a bit in context of further nuanced benefits of using the approach of AssistantAPI + ChatCompletion.

Primary amongst them :

Ability to ignore certain messages
Ability to list threads
Abiliity to customize threads and use that custom behaviour
…

thinktank · September 28, 2024, 5:38pm

This seems like another excellent rule of thumb: “Pass your data to the most stable system whenever possible.”

MrFriday · September 28, 2024, 6:02pm

I am sorry but I disagree. Everything that OP has mentioned in achievable via Chat Completion.

jochenschultz · September 30, 2024, 3:50am

sudirlay · October 16, 2024, 6:04am

This is what I told this guy, also that he can train a model IE fine tune etc. and it will save him having to send custom instructions for the chat everytime he initiates a session. Ive done it but Thinktank is so fast to dismiss someone’s answer smh. I hope he takes the advice.

Which api to use in my case?

Related topics