I have several question Regarding GPT 3.5 or GPT 4, Thread, & Assistant

I have several question that I want to make sure for this 3 section:

First Section - GPT 3.5 - GPT 4

  1. In the docs it state that new Model is from 2023 - ???, as for ??? how long the range it included? is it until our current date? let say if now this question created on Feb 26th, 2024, is it mean the model knowledge has 0 gap date which up to date to this exact date or how long the gap between the knowledge is provided? I tried to search on documentation but there is no exact statement about this, maybe can be helpful if you put some kind of statement in there, let say there is a new movie which release next week, can it get what the schedule and which cinema that release it?

  2. Regarding streaming, Since we already know that Chat GPT itself using stream method to generate the answer, I’m curious regarding what is the biggest differences between using normal method & stream approach? is it make the right answer more precise than normal way, or is it make the cost more cheaper compared to the normal one with the benefit of fastest result? (Since I guess it’s somehow chunk the full context before returning it & in documentation only state iteratively and not touch any other reason for it so I’m curious about the advantage & disadvantage of both method)

Second Section - Thread

  1. Is there any example as how thread response will be? I mean on documentation the response somehow from user perspective like
    How does AI work? Explain it in simple terms., but I can’t find example from the open AI answer for this question and how the response looks like

  2. Let say if I have 3 thread, thread A have a topic about music, thread B have a topic about fashion, and thread C have a topic about Hiking. is there any way for thread A to relate to the topic of thread C? like something I said on Thread C regarding Hiking and reference it to thread A or thread B? for example in Hiking I have a joke about the meaning of Huwawa which means camp without fire to counter bear, If I said “do you still remember about Huwawa? can you create a song about it” on thread A, can it still reference it? or is there any way I can achieve it? (or is it even not possible to have cross threads topic like this? like a limitation for threads?)

  3. There is file id parameter on documentation, I want to make sure this case, is this file id in here if I use Code Interpreter means Open AI can know and train the context of the file itself? let say I upload PDF about a story, can it tell me what the conclusion of the story from the file that I upload using this parameter?

  4. Is thread log have the same behavior as Chat where it will be removed after 30 days? if yes, can it still know what we talk before or should we consider the knowledge on that thread is gone?

  5. Is token on the thread or main message will be count as the request token every time I send a new message on the thread? let say I make a rule on thread that it will always return in some kind of symbol, so on the 3rd message I will just write “Hello world” & the message will return a symbol right? is the first rule on the thread count as request token too for the N message after the first rule I state or it will be treat separately? (like rule cost 300 token while new request message count as 45 token, will it always sum it up into 345 token for the new message or will it stick to 45 token)

Third Section - Assistant
This is actually what is interesting in the new update and I have some kind of question about this

  1. Since Assistant have the same main behavior with thread (from my understanding) what the most noticable differences with the thread? since thread can use tools too like code interval & retrieval It make me hard to find the difference between this too

  2. I believe I read it in the forum about assistant cost for token, it seems it count as free except for the code interval & retrieval which use session for cost, so if it correct (CMIIW), as long as we don’t send any tools (I mean file either upload or retrieve it), it have the same behavior as Thread and chat right?

  3. last is about cost on using it, I think it will starting to charge the moment you call it (the tools), but as for let say using code retrieval it will only cost 0.2 for an hour, let say if now is 14:00, will it means it will charge me again on 15:01 or for the usage duration, means if I use it only half an hour, since I still have half hour left, I will not be charge for the next half hour if I use it again on 16:00-16:30?

Thanks in advance, sorry if the questions is a lot, since I really excited to find how far Open AI can take my project to

Best Regards,
Rakish

Welcome @RFTaurust - I’ll get started with a few answers here, I’m sure others will chime in soon to address the rest.

In the docs it state that new Model is from 2023 - ???, as for ??? how long the range it included? is it until our current date? let say if now this question created on Feb 26th, 2024, is it mean the model knowledge has 0 gap date which up to date to this exact date or how long the gap between the knowledge is provided? I tried to search on documentation but there is no exact statement about this, maybe can be helpful if you put some kind of statement in there, let say there is a new movie which release next week, can it get what the schedule and which cinema that release it?

For each group of models the training cut-off date is currently slightly different. This page provides you with a detailed breakdown: https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo
GPT-4 turbo models have the most training data up to December 2023. This means, there remains a gap of a few months. Practically speaking, if you were to ask the model about an event that happened in January 2024, it would not know the answer to that unless you provide it with additional external information.

Regarding streaming, Since we already know that Chat GPT itself using stream method to generate the answer, I’m curious regarding what is the biggest differences between using normal method & stream approach? is it make the right answer more precise than normal way, or is it make the cost more cheaper compared to the normal one with the benefit of fastest result? (Since I guess it’s somehow chunk the full context before returning it & in documentation only state iteratively and not touch any other reason for it so I’m curious about the advantage & disadvantage of both method)

The primary purpose of streaming is to reduce the time until users see parts of the response. This is particularly useful when normally it would take a significant time for the model to provide a complete response. Streaming does not have an impact on the quality nor is it cheaper.

For the relationship between Assistants and threads, you should go over this official documentation, which should help to address some of your broader questions:

https://platform.openai.com/docs/assistants/overview

https://platform.openai.com/docs/api-reference/assistants