BOT for a specific KB which is longer than 4000 tokens

I am trying to create a bot that will answer questions specifically from a KB and not wander about in the wild.

The KB is way longer than the 4000 token limit.

Tried summarizing it, breaking down processes into smaller task, splitting prompts but it just won’t cut.

Is there a way I can ‘train my KB’ and create prompts that would exclusively return completions based on the KB?

Is embedding a potential way of doing this - if so, how would you approach it?

Thanks!

What is a KB? There are several approaches to training bots

Oh look, someone like me.

When you say splitting prompts won’t cut it, what exactly are you saying? What you should be trying to do (or rather, what I’m trying to do) is enter your document one paragraph at a time. Like this:

{prompt:“”, completion: “paragraph1”}
{prompt:“”, completion: “paragraph2”}
etc etc.

Don’t forget your end tags. I’ve also tried entering it one sentence at a time, but I don’t think that’s likely to be the best approach. (Thank god the company is paying for this.) Then you get to fine-tune it. Good luck with that. Maybe it’s the document I’m working from (it’s more like a guidline/outline hybrid), but this has proven painful and difficult. I want to automate it, but I worry about sacrificing the quality of my finetunes. This seems especially important early on with the initial fine-tune. I’m not AI expert, but I’d be happy to share my experience with this. And if you make any progress, post it up. I’d love to see how you make out.

knowledge base probably

1 Like

You dont need to finetune the model. Just break your document in logical components and create embeddings. I would recommend, use DaVinci to create such embeddings. Now, create a prompt that asks openAI to answer the questions truthfully and say “Unknown” if the question can’t be answered truthfully. Depending on what you are trying to do, return anywhere between 1-10 top embeddings that match with the question and use completion API to generate the answer.

1 Like