Use "private" dataset as basis for AI responses

obeydesign · July 8, 2021, 6:30pm

An example will help explain my question. I have a dataset of 5k book descriptions, and I would like GPT-3 to use this dataset to help me write new book descriptions. Can I do something like this, can I train GPT-3, and if so, how?

I’m not a dev, I’m a writer, learning Bubble to use as the front-end solution for deployment.

Thanks!

daveshapautomator · July 8, 2021, 10:09pm

GPT-3 can already write great book descriptions. Just use a few shot prompt.

obeydesign · July 8, 2021, 10:18pm

Thank you for responding. I just used book descriptions as an example. It’s not the dataset I want to work with.

daveshapautomator · July 8, 2021, 11:35pm

Could you give an example of what you want to output?

obeydesign · July 9, 2021, 12:14am

I’m afraid I need to be cagey, as my plan is to make this a business. To use my book example, let’s say I have thousands of book back cover descriptions, along with their genres. In my application, you’d pick your genres, add a few other elements like a main character name, and the system would generate a back cover description using the main character name, genre, and information gathered from similar genre’d novels.

My question is, how do I teach GPT-3 the back cover descriptions, genres (and probably book names, too), to generate output?

Does that make sense?

daveshapautomator · July 9, 2021, 12:32am

It does make sense, and the answer remains the same: use a few-shot example. GPT-3 is smarter than you are thinking it is.

obeydesign · July 9, 2021, 1:37am

Not ideas for new books–for this example–but the creation of the back cover copy for a “submitted” book. In other words, I have a book, I have to write back cover copy, I input some information into the interface, and I get back a paragraph or two of back cover copy for this book.

A quick edit: This need for particular verbiage and structure is why I think–but cannot state with certainty–I need to teach GPT-3 about how to write back cover copy via a number of examples of back cover copy from other books.

obeydesign · July 9, 2021, 1:56am

I’ve perused the workshop, but didn’t find anything I thought similar, even after I poked through a few of them, just to see if, at a broad level, if anything resonated. Guess I’ll keep looking.

With that said, what is the mechanism to teach GPT-3, with, for instance, a spreadsheet of information?

dennyroberts · July 9, 2021, 3:00pm

So this is my understanding, let me know if I’m off base:

If you want GPT-3 to give do predictable transformation or generation (say, you input a book and it outputs a description), the best way to do that is to give it 3-5 prime examples and it will do the rest
Even if you have 5000 examples, more examples won’t help it and/or will be computationally expensive to include every time you hit the API

Is this basically correct? And so if you do happen to have 5000 great examples, is there any way to incorporate those into the request every time, if you aren’t getting perfect results using the few-shot prompt?

daveshapautomator · July 9, 2021, 3:19pm

I would say just read the documentation: OpenAI API

dennyroberts · July 9, 2021, 3:37pm

It doesn’t seem like there’s anything about incorporating large, custom data sets in the “Get Started” section, but the API Reference mentions uploading “Files” which can include examples. I didn’t see it list an optimal number of examples, but the file size limit is 1GB so I’m guessing it can include quite a bit.

Thoughts on this? Any tutorials or anything explaining this further?

But also, it seems like the “examples” parameter is only used in the Classification endpoint. It’s my understanding that the Completion endpoint would be used for something like text summarization, right?

daveshapautomator · July 9, 2021, 3:51pm

Storing files is more for Answers. Anyways, I was trying not to be rude and repetitive - GPT-3 is not a conventional ML model where you give it more and more training samples. Make sure you understand what “zero shot” and “few shot” mean. For the task described, you probably need 1 or 2 examples, but since no example was given, I cannot help with the prompt. I would need to know what the output is supposed to look like in order to help.

GPT-3 can be fine-tuned but that is still closed beta or closed alpha AFAIK. Anyways, you would not need to fine-tune it for this task.

obeydesign · July 9, 2021, 5:45pm

I wouldn’t be giving 5000 texts of complete books, just the back cover copy from 5000 books. For example.

Also, for this example, book back cover copy is written to sell the book, so, the language used and associated styles are important to employ/maintain.

obeydesign · July 9, 2021, 5:56pm

My plan…my tacit plan, is to give the user the ability to filter through a number of variables, such as genre, and use the related results as templates to generate new data.

obeydesign · July 9, 2021, 6:19pm

Excellent! That is basically my “I’m not a dev” plan.

daveshapautomator · July 10, 2021, 10:08pm

It occurred to me that if you have a library of 5000 examples you can use semantic search to pick 2 or 3 that are close to the new one you want to generate to use as samples.

obeydesign · July 11, 2021, 1:22am

Thanks for that. This is the conclusion I’ve also come to, so, you’ve given me expert confirmation. Would I still use zero shot and/or few shot for the rest?

I’m currently learning Bubble as the solution to keep the data, perform the searches, and integrate the GPT-3 API key.

NSY · July 11, 2021, 10:03am

Use semantic search + query, and within the text it picks up in the semantic search, you can implement some instructions. That way even if the AI doesn’t get the full amount of text you wanted it to have, you can help by providing further instructions so you’ll have a modified prompt for every query.
For example, within the texts, it can have a : "Human: "/n AI:/n

obeydesign · July 11, 2021, 7:27pm

Thank you for writing. At the moment, this is above my current knowledge, but it gives me something to work toward.

craig.thomler · July 14, 2021, 8:39am

We’re writing book blurbs already with GPT-3 as part of our solution.

We find few-shotting gives a response good enough for 98% of authors.

Topic		Replies	Views
Fine-tuning with Contextual Information Beyond Prompt-Response Pairs: Possible? API question , fine-tuning , beginner	11	1619	June 29, 2024
Training gpt-3.5 to autocomplete for a niche domain and a specific writing style Community chatgpt	13	1880	July 25, 2024
Generating a report from a limited corpus? Prompting	13	2590	December 16, 2023
Is it possible to fine-tune a model to answer questions given a raw text? Prompting	18	10244	December 15, 2023
Fine Tuning ChatGPT with large text from Books Prompting	18	11544	March 26, 2024

Use "private" dataset as basis for AI responses

Related topics