Making Fine Tune Davinci an SME on a certain topic

alexrazen · August 2, 2022, 7:10pm

Hey everyone!

If say I wanted davinci to be an SME in a certain topic. And say for instance I had a clean textbook on that topic (suppose no images for now) that I fed into davinci.

How would one set up the training file that goes into davinci so that it understands all the content within that textbook and that it could actually become an SME on that topic?

Thanks!

daveshapautomator · August 2, 2022, 7:22pm

It depends on the topic. If it’s history or other fact-based, then you’re better off with a Q&A system. If it requires more executive function (like solving problems) then your finetuning data will look very different.

alexrazen · August 2, 2022, 8:06pm

So say for instance it was like pet health content - like if I wanted GPT 3 to be an SME in a specific type of cancer within dogs.

How would I feed in the training data so that davinci will understand the content and then write articles from it?

daveshapautomator · August 2, 2022, 8:42pm

Content writing is different from being an SME. I’m kinda confused as to what you want to do with this? You’re being a little vague, perhaps not deliberately. It would be helpful if you gave very specific examples of the outcomes you’re looking for.

alexrazen · August 2, 2022, 8:58pm

Ahh sorry let me explain my process a bit:
I want to train davinci to write long-form content articles/blogs from a corpus of dog health articles I’ve found. I’ve taken the last few weeks in collecting this training data for davinci. I’ve collected fairly technical articles from animal blogs and scientific articles and then cleaned them (removed images, weird formatting, etc.).

I’m now at the step where I need to upload a training file to OpenAI in “Prompt… Completion” format. I am not sure how best to structure the training file so that davinci will perform well in the open ended text generation portion.

At the end of the process, I want to type in a prompt like “What are some different ways to mitigate cancer in dogs at different stages in their life” and davinci will generate content from that corpus of documents.

Hope this helps paint the picture a bit better!

daveshapautomator · August 2, 2022, 9:04pm

Here’s how I’d approach it Answer complex questions from an arbitrarily large set of documents with vector search and GPT-3 - YouTube

Basically the only change is that you’re looking to produce longer answers from a variety of sources. But the underlying principle is the same.

alexrazen · August 3, 2022, 2:35am

Ah interesting, so you never actually fine tune a model. When do you actually ever fine tune models if you can just feed everything into davinci 002?

SomeUser2022 · August 6, 2022, 8:37pm

I haven’t worked with fine tuning, but…

One reason would be to save on tokens. If you’re supplying a bunch of text in with your query, then those count towards your token costs. Whereas if you fine-tune and do queries against that model, the “source text” you trained on doesn’t cost anything, only the generated response.

Another reason could be token limits, I think theres a 4k limit per query. So if you’re using 3500 tokens of training data inside the prompt, you’ll be limited in how much you can generate. Whereas a pretrained/finetuned model you can get the full 4k of output.

I bet pretrained/finetuned models would be faster too

jhsmith12345 · August 7, 2022, 4:29am

Dave’s approach uses embeddings

ucsdsu · November 3, 2022, 10:23pm

Hi Dave - I’d love to learn how you did this. I tried clicking the link but the YT video is private. Would it be possible to gain access, please? Thanks!

valerio78 · November 24, 2022, 11:46am

the video seems to be private

Topic		Replies	Views
How to fine tune a chatbot for Q&A API	12	8419	December 16, 2023
Embeddings vs finetunes API	7	2864	January 16, 2023
Fine tune fine tuned models API	18	3808	January 30, 2024
Fine tuning completation API	9	2351	December 25, 2023
Set context for fine tunning Prompting	2	1027	December 8, 2022

Making Fine Tune Davinci an SME on a certain topic

Related topics