Searching With Wildcards In OpenAi's GPT-3 Database


I heard somewhere that OpenAI has analyzed hundreds of terabytes of information and that it probably has read almost any book, any song lyrics, etc that is on the planet.

I was curious about OpenAI for several reasons, but one of the reasons is to use it to get inspired for writing.

I have two questions:

  1. Does OpenAI have the following functionalities?

For example, there’s a linguistic search engine called WebCorp LSE. You can type “you are {NOUN} of my” and it will look in its’ database and will give you results like this > For Open AI.PNG - Google Drive > where it will give you every sentence it has found with an example where it would have “you are *(any noun) of my” keywords in a sentence… (Please see the example above).

Can OpenAi search within it’s database and find all the sentences that contain specific sentence structure with specific words, for example: “you are *(any noun) of my”; “you are *(any noun) to my”, etc and show all the examples it has found in the database?

Note: The problem is that this linguistic search engine has scraped only blog posts on wordpress and google. If OpenAi has read literature books that will be a game-changer

  1. If OpenAI has read literature books as well, for this search “you are {NOUN} of my” I expect to get result like “you are gift of my life”, “you are center of my world”, “you are source of my happiness”, etc. and it helps me to take inspiration for my work… I can search many keywords like these to get inspired for my work and it would really change the writing capabilities of mine… For example “you are like a {NOUN}“, “you {VERB} like”, etc…

My second question is have OpenAi read almost all the literature books that is available in digital format?

Thanks a lot!

It can do the functions you are describing however you would have to provide the data in a dataset.

If you do not want to provide the dataset you can think of it’s default dataset as everything that is accessible for free on the internet before 2021.

You should just sign up and test it out it’s free for a few months.

1 Like

GPT-3 does not have access to its dataset. The dataset is represented by weights and nodes in its neural network.

LaMBDA by google (no public access) however has access to its own datasets.

1 Like

Thanks for replying, SaturnProductions

  1. “It can do the functions you are describing however you would have to provide the data in a dataset.” - You meant that I have to upload text of all the books that I want the bot to search through and it can then search through my uploaded data, did I understand it correctly?

  2. So you are saying that it has parsed through all the data that is available on the web, but it did not read literature books, is that correct?

Thanks a lot!

  1. it specifically asks for a .jsonl format with a special formatting but yes that is basically what you have to do, they have a great guide in the documentation.

  2. I don’t know the answer to that because its possible to find lots of books for free legally on the internet, but maybe not the specific one you want.

I did a quick test and it’s pretty good with understanding what you want except it has trouble giving exact numbers of things, it needs to be trained.

Example input

List 5 classic literature books that you have completely read:

Eample output

-The Catcher in the Rye
-To Kill a Mockingbird
-The Great Gatsby
-The Grapes of Wrath
-Animal Farm

input again:

How many times does the exact word " champagne" appear in the book "The Great Gatsby"?


The word "champagne" appears 9 times in the book "The Great Gatsby."

I don’t think that’s correct I did a search in a pdf file of the great gastby i found and it said 7
so it would need training and playing around to get it to do it correctly.