Greetings everyone! I’m excited to share that I’ve developed a chatbot utilizing langchain, pinecone, and the powerful GPT 3.5 turbo. Currently, the chatbot’s responses to user queries are satisfactory, but there’s room for improvement. I’ve noticed that when a user says “Hi,” the chatbot seems to lack context from the provided documents.
To enhance the chatbot’s performance and generate more coherent responses, I’m considering fine-tuning the model to better suit my specific needs. Although I’m already using custom data from my app’s help files, I believe that with fine-tuning, I can achieve even cleaner and more relevant answers. Any guidance on how to proceed with the fine-tuning process would be greatly appreciated. I have not seen any artiles or videos so far. Thank you!
Fine-tuning will be ten steps back from “powerful GPT 3.5 turbo”. The models you can train are a base GPT-3, literally sixty times more expensive than gpt-3.5-turbo after the fine tuning, and don’t come with any pretuning or training of how to chat or follow any instructions one might be impressed ChatGPT can do.
I said Hi to davinci. It completed that with “The record was excellent! All good!”. I say “print help”? Out comes “Return on Investment (ROI) Data: Shows the Annual Rate…” That’s what you start with.
Step 1 is to come up with a thousand chat-like prompts and replies. Maybe ten thousand.
So it seems like maybe you just need a solid paragraph of system prompt for what you’ve already got to tell your chatbot what job it does for you, and then it can introduce itself properly with “hi”.
Your issue “there’s room for improvement” issue is that gpt-3.5-turbo lacks the power, efficiency and downright effectiveness of gpt-4. It’s not only not as smart, it’s relatively stupid. I tried using gpt-3.5-turbo with a document store consisting of a variety of real estate legal, regulatory and public correspondence (bulletins, pamphlets, etc…) and it’s performance was terrible.
I am doing embeddings, so the llm is not determining the context documents, the vector database is. I noted that, given the exact same context documents, gpt-3.5-turbo would fail to comprehend their meaning at least 50% of the time while gpt-4 would correctly analyze them 90+ percent of the time. Same exact documents.
Most of these “chat with your pdf” systems you hear about are using gpt-3.5, and I imagine for small numbers of relatively simple documents, that must work perfectly. But I wouldn’t trust gpt-3.5-turbo for anything serious in a professional environment.
Aah, makes sense. All those youtube videos are just basic trainings. I couldn’t see any complex problem being solved. You are right in terms of how clean and less complex that data would be. Btw, I just was tryong gpt 3.5 turbo 16k. It gives some stupid random answers at times.
It would cost the same to use gpt-4-32k as it would to use a fine-tuned davinci model.
There is so much you can do to improve model responses with in-context few-shot examples and using embeddings to add key information into the context that I would start there well before I would consider fine-tuning.