Training Data, Context Data, Query Data - Difference and Token Limitations

mariam.akrout · June 16, 2023, 4:48pm

Hello everyone,
I’m currently working on fine-tuning my model and I’m a bit confused about the distinctions between Training Data, Context Data, Query Data, and their respective token limitations. Let me explain further:

I have a substantial amount of domain-specific data that I want to use for fine-tuning the model. This will help the model understand the specific rules and intricacies of the domain.

Once the model is fine-tuned, it will be applied to specific projects and it needs to understand the context of these projects. After being trained and familiarizing itself with the project context, the model should be able to answer queries based on that context while following the key domain rules it was trained on.

I’m unsure whether there is a difference between the fine-tuning data and the context data, and if there are any token limitations for each. The same question applies to the token limitations for queries.

I would greatly appreciate your help with this matter as it is causing me significant confusion.

Thank you very much.
Best regards.

scottfarris81 · July 3, 2023, 5:03pm

I hope I don’t mess this up, but to make easily undertood generalized abstract definitions, you must absolutely not use what I say as verbatim when you implement. Just use as reference so you will understand what to look for. Training data: is like generalized examples of random comparable data it can be repetitive and similar to the other data but this is to make sure incoherent outputs are less likely. Context data: is like factual information aboutwhat makes the main data relate to a specific group of petential outputs like the difference in getting a balloon at a carnival and renting a hot air balloon you would need to compensate for lack of input to signify the difference as it is interpreted, query Data: like the FAQ section of any website. Token limitations: a tricky one but similar to deciding if (lets use 1 sentence as an example) you are making importance values on what percentage of the sentence is used by each relitive piece of information that is going to make up that sentence so you get a more balanced output. I could be way off but so far that is a rough understanding of how they work. If anything hopefully it makes it easier to discover.

Topic		Replies	Views
What does it mean to limit tokens for fine tuning? Documentation assistants , assistants-pricing	7	4749	December 18, 2023
A question regarding fine-tuning Documentation fine-tuning	8	979	March 12, 2024
How to build a fine-tune Q/A model API embeddings , chatgpt , fine-tuning	4	1036	December 17, 2023
Context length VS Max token VS Maximum length API	8	65770	December 13, 2023
How compartmentalized (if at all) can we be when training our API model? API api	3	585	August 4, 2023

Training Data, Context Data, Query Data - Difference and Token Limitations

Related topics