I have successfully fine-tuned an OpenAI model but its performance is weak and it is only complete some basic answers, now i want to pass SKU codes of product as a unique identifier or as a single token to improve its performance
Can someone help me understand how can i convert unique IDs or SKU codes into single tokens, which can be used inside my training dataset files?
Here is an example prompt and completion inside my training file:
prompt: “can you provide me with 3 sku codes for dark grey products that require no assembly?”
completion: " TBAC2899, TBQC2819, ACAC2811"
I do have many basic prompts and basic completion answers which work very well,
Also , i’ve associated product SKU codes with its features and other product properties in the training set as separate prompts and completions which works well
But when i ask fine tuned DaVinci to respond back with list of SKU codes it hallucinates the answers with incorrect sku codes that is why i want to pass it as single token…
How my SKU codes are tokenized right now: