I heard somewhere that OpenAI has analyzed hundreds of terabytes of information and that it probably has read almost any book, any song lyrics, etc that is on the planet.
I was curious about OpenAI for several reasons, but one of the reasons is to use it to get inspired for writing.
I have two questions:
Does OpenAI have the following functionalities?
For example, there’s a linguistic search engine called WebCorp LSE. You can type “you are {NOUN} of my” and it will look in its’ database and will give you results like this > For Open AI.PNG - Google Drive > where it will give you every sentence it has found with an example where it would have “you are *(any noun) of my” keywords in a sentence… (Please see the example above).
Can OpenAi search within it’s database and find all the sentences that contain specific sentence structure with specific words, for example: “you are *(any noun) of my”; “you are *(any noun) to my”, etc and show all the examples it has found in the database?
Note: The problem is that this linguistic search engine has scraped only blog posts on wordpress and google. If OpenAi has read literature books that will be a game-changer
If OpenAI has read literature books as well, for this search “you are {NOUN} of my” I expect to get result like “you are gift of my life”, “you are center of my world”, “you are source of my happiness”, etc. and it helps me to take inspiration for my work… I can search many keywords like these to get inspired for my work and it would really change the writing capabilities of mine… For example “you are like a {NOUN}“, “you {VERB} like”, etc…
My second question is have OpenAi read almost all the literature books that is available in digital format?
“It can do the functions you are describing however you would have to provide the data in a dataset.” - You meant that I have to upload text of all the books that I want the bot to search through and it can then search through my uploaded data, did I understand it correctly?
So you are saying that it has parsed through all the data that is available on the web, but it did not read literature books, is that correct?
it specifically asks for a .jsonl format with a special formatting but yes that is basically what you have to do, they have a great guide in the documentation.
I don’t know the answer to that because its possible to find lots of books for free legally on the internet, but maybe not the specific one you want.
I did a quick test and it’s pretty good with understanding what you want except it has trouble giving exact numbers of things, it needs to be trained.
Example input
List 5 classic literature books that you have completely read:
Eample output
-The Catcher in the Rye
-To Kill a Mockingbird
-The Great Gatsby
-The Grapes of Wrath
-Animal Farm
input again:
How many times does the exact word " champagne" appear in the book "The Great Gatsby"?
output:
The word "champagne" appears 9 times in the book "The Great Gatsby."
I don’t think that’s correct I did a search in a pdf file of the great gastby i found and it said 7
so it would need training and playing around to get it to do it correctly.