Which api to use in my case?

Hello everyone :wave:! My task is to turn a set of texts (subtitles from YouTube videos) into readable articles. Before processing, I need to send an instruction (about 5000 tokens) on how to work with them. After that, I will send the texts (subtitles) one by one. It is important that the LLM does not forget the instruction during the process. Which API would be best suited for this?

You are expecting too much - very sorry to say that. But LLM are not on that level. Not even close.

I would even say that a 5000 token Instruction doesn’t have any sense either - there might be exceptions e.g. when you are using Multishot Prompting and add tons of examples. But expecting that any model would be capable of taking 5000 token long instruction and then follow them all is another level.

Think of a LLM like an intern that has a father who got a phd in everything.

You can give it some information and then it will try to remember as much as possible of that and ask the father and then come back with the answer.

1 Like

You could use multiple interns with smaller instruction.

Like one so called micro agent that searches for any type fruit in a sentence and then writes to a datastore:

sentence | fruit | and all the other results here…
this is a strawberry | yes

I know there are better data structures :wink:

1 Like
  • So I looked up this YouTube Video
  • had the following Youtube SubTitles
  1. Monte Carlo Tree Search - MCTS in AI Agents

  2. Strategist: Learning Strategic Skills

  3. Strategy update process with Self-play and MCTS

  4. MCTS explained step by step

  5. Add an ethics filter to MCTS simulations

Generated the following content through selfet :

I use the AssistantAPI in selfet. (GitHub - icdev2dev/selfet)

hth

Hi @apris !

Regarding your original question: standard ChatCompletions API using a GPT-4o model should work fine. Provide the instructions in the system prompt, and the transcripts in the user prompt.

As others have pointed out, a very large instruction like yours may not be optimal. The largest system prompt I worked with that yielded acceptable results was around 1000 tokens, and that was for a large data extraction and formatting task, which I would argue is easier for an LLM. You may want to either simplify your instruction so it’s <=1000 tokens, or consider doing it in multiple stages (multiple sequential API calls), e.g.

  1. Translate to <INSERT_LANGUAGE_HERE>
  2. Remove any superfluous information (side comments / gestures, background noise, interruptions)
  3. Find top-3 takeaways / themes
  4. Combine the output from (3) and (2) to provide a summary
  5. Use the output from (4) together with <INSERT_LANGUAGE_STYLE_INSTRUCTION_HERE> to tidy up the summary
  6. …

Hope this helps!

2 Likes

The same scenario works in web version (we checked it before). So maybe the level is ok, just need to find out the corresponding API?

In web version you select the model

Sorry, how exactly it answers my questions? Do I need to use assistants API or?

Your question was “which API to use in my case”.

What I have been advocating for (and the code is open source) is to use both the Assistant API and the Chat Completion API. The rationale, in my mind, is simple.

Use Assistants, Threads and Messages from Assistant API) as a persistent store of instructions and interactions with the LLMs. Use ChatCompletion to do text generation.

One of the benefit is much better control over the interaction (i.e. one can ignore certain messages and focus the context on things that matter).

Talking about context, I have not used 5000 tokens to produce instructions and it may be overkill for an advanced LLM like gpt-4o, gpt-o1. This is because these LLMs understand much more than having to feed detailed instructions. Of course, do your own experimentation,

hth

1 Like

Great idea! Where can I find example of implementation? (as you mentioned, it is open source)?

I’ll answer your question simple and straightforward, use the chat completions API and train a custom model from a base where you train it on most of the instructions you are sending so that when you initiate a completions session for the YouTube text, you are not sending all those instructions everytime with the text. This is what training your own model is for, if you train it well enough it should be at a point where you don’t have to send any instructions to it anymore, you are simply sending the YouTube text in your API calls to chat completions API with your custom model ID and it knows all the instructions you’ve trained it on to handle your operations without reminder.

hth

:man_facepalming:

I also think we can all agree that they’re called Micro Agents and @stevenic coined the phrase. Hail steve.

Good rule of thumb. Thanks.

That’s a neat idea IC’d. But why not just use Assistants across the board since they’re basically Completions with more abilities?

There really is no way to “train your own model” right now. You can fine tune a model, but this is to help “fine tune” things like tone and diction, its not the same thing as training on a data set. @jr.2509 describes it well here, and recommends this article.

Finally, @apris I agree with ICD and Jochen that you should be looking at hyper specialized micro agents to individually work on your flow on single-minded tasks. Think of Henry Ford’s Assembly Line but for tireless AI Assistants.

My initial thing (started about 10 months ago) was that AssistantAPI (Assistants, Threads, Messages …) was in Beta. It is still in Beta. What I noticed the relative periodic runtime instability of the inferencing part of the AssistantAPI versus relative stability of the core data structures underlying the AssistantAPI (essentially Queues, Sets and Modifiable Sets). The ChatCompletion API had been stable for a long time and in GA.

From that initial thing, my thoughts have evolved quite a bit in context of further nuanced benefits of using the approach of AssistantAPI + ChatCompletion.

Primary amongst them :

  • Ability to ignore certain messages
  • Ability to list threads
  • Abiliity to customize threads and use that custom behaviour
  • …
1 Like

This seems like another excellent rule of thumb: “Pass your data to the most stable system whenever possible.”

I am sorry but I disagree. Everything that OP has mentioned in achievable via Chat Completion.

This is what I told this guy, also that he can train a model IE fine tune etc. and it will save him having to send custom instructions for the chat everytime he initiates a session. Ive done it but Thinktank is so fast to dismiss someone’s answer smh. I hope he takes the advice.

1 Like