Reference data for GPT-3 need far less detail

daveshapautomator · July 7, 2021, 8:23pm

I’ve been working on this problem some more. I may have untangled it partly.

If you start with very short/small entries (such as individual memories, chat logs, news bytes from RSS feeds, etc) - break down your database into very atomic entries. A few sentences each, maximum.

Then you can use an index/search tool (like SOLR or ElasticSearch) to find related/relevant snippets, even if they are from very different sources. Those disparate sources can be news articles, wikipedia articles, previous conversations, PubMed papers, etc. Then, with very short snippets, you can rapidly compile them into a reasonably sized document.

Then, with that reasonably sized document, you can rely on GPT-3’s internal understanding of the world to produce good answers to any problem. (in theory, this last part may be wishful thinking on my part).

In the future, hopefully GPT-4 can ingest 20,000 tokens instead of 2,000, so you can give it larger chunks of information. Maybe GPT-5 can take in 2M tokens.

Anyways, in the meantime, I think atomic/granular entries are the way to go.

Topic		Replies	Views
Summarizing or question answering from long Wikipedia articles? API	25	3921	January 4, 2024
Using my own knowledge base with GPT-4 Community gpt-4	9	21390	December 18, 2023
How to feed data for completions, instead of using prompt/answer fine-tuning format? API	25	17707	December 17, 2023
Idea of context for GPT 3 API API	15	3627	December 15, 2023
How to structure the embeddings? API	15	3373	April 20, 2024

Reference data for GPT-3 need far less detail

Related topics