I’ve written a script that reads wikipedia and writes prompts from the page content.
The prompt is < h1>\n< h2>\n< h3> from the page and the corresponding collection of paragraphs under those headings. In this instance I’d like the completion to come up in a conversation about the name of Colchester. Seems a bit massive and there’s not a lot of focus. I’m new to writing prompts so could someone give me an idea about how the AI might react to big chunks like that please? How to improve the prompt? Plan is just leave the program scraping everything post 2021 on Wiki so the content is mostly going to be similar to below so far.
{“prompt”: “Title: Colchester\n Header: Name\n Subheader: ++++”, “completion”: " There are several theories about the origin of the name Colchester. Some contend that is derived from the Latin words colonia (referring to a type of Roman settlement with rights equivalent to those of Roman citizens, one of which was believed to have been founded in the vicinity of Colchester) and castra, meaning fortifications (referring to the city’s walls, the oldest in Britain). The earliest forms of the name Colchester are Colenceaster and Colneceastre from the 10th century, with the modern spelling of Colchester being found in the 15th century. In this way of interpreting the name, the River Colne which runs through the city takes its name from Colonia as well. Cologne (German Köln) also gained its name from a similar etymology (from its Roman name Colonia Claudia Ara Agrippinensium). Other etymologists are confident that the Colne’s name is of Celtic (pre-Roman) origin, sharing its origin with several other rivers Colne or Clun around Britain, and that Colchester is derived from Colne and Castra. Ekwall went as far as to say “it has often been held that Colchester contains as first element [Latin] colonia … this derivation is ruled out of court by the fact that Colne is the name of several old villages situated a good many miles from Colchester and on the Colne. The identification of Colonia with Colchester is doubtful. The popular association of the name with King Coel has no academic merit. ####”}
AFAIK ChatGPT was trained using Wikipedia so giving a Wikipedia page in the prompt is wasteful. Also giving the page will not limit the result to the just that page but might increase the attention to that page.
Instead just ask ChatGPT the question and see what the completion notes.
See:
What might be better is to ask ChatGPT and then verify the results aginst the info in the Wikipedia page and then if ChatGPT makes a mistake correct it and ask for an update.
I’ve looked at tokens. I’m going to max out my allowance really quick! That’s inevitable if I’m scraping wiki though.
It was just how to form the prompt I was concerned by. For the example above the prompt is made up of two keywords ‘Colchester’ & ‘Name’ and the completion is three paragraphs.
If I split it this way there’s three identical prompts pointing to three different completions which just feels wrong.
{ prompt: Colchester \n Name completion: paragraph 1 }
{ prompt: Colchester \n Name completion: paragraph 2 }
{ prompt: Colchester \n Name completion: paragraph 3 }
If I split it this way that’s a lot of information and I feel probably a bit too much to ask if you’d like responses to be short and direct. Some of wikis sections can go up to paragraph 10+.
Are the keywords ‘Colchester’ and ‘Name’ enough to begin with to have the model respond if I ask about the history of Colchester’s name? Does the prompt itself need to be larger?
One of my favorite uses of GPT is reducing complex, sometimes seemingly totally unrelated topics. I agree with @EricGT that is seems wasteful to just reproduce stats from wikipedia, but maybe instead capitalize on the depth of knowledge to ask prompts that cross domains. Ask about how etymology of Colchester relates to other, lesser known topics. You can even get meta and ask GPT, “I have this topic and I want to get more interesting information from you, a large language model, but I’m not sure where to go or what to ask. What questions would you ask?” or “What makes this topic particularly interesting that no one knows about, from the perspective of a large language model, capable of reducing and synthesizing complex information from multiple domains?”
EDIT: also, I actually tell the model it’s a prompt engineer and ask it to help me write prompts… it would know best, right? I find that meta prompts are a secret sauce.
I don’t think so only if it included prompt engineering pages with its training set would ChatGPT be able to help. Even then they would probably have to be for ChatGPT itself to really be of value.
Hello. The information I’ve got already. Plus, each Wikipedia page has dozens of links to go to from there so there’s always more to have.
The problem I’ve got is if I give the trained model (ada currently) the keywords ‘Colchester’ and ‘Name’ I’d like it to return something from one of the three paragraphs in the prompt below. Currently it talks of the Anglo-Saxon origins which is just way off since has Colchester’s name has zero Anglo-Saxon origins. If an untrained model is a snapshot of the internet up until 2021 it possibly sould have had this information already shouldn’t it?
{“prompt”:
“Colchester\nName ++++”,
“completion”:
" There are several theories about the origin of the name Colchester. Some contend that is derived from the Latin words colonia (referring to a type of Roman settlement with rights equivalent to those of Roman citizens, one of which was believed to have been founded in the vicinity of Colchester) and castra, meaning fortifications (referring to the city’s walls, the oldest in Britain). The earliest forms of the name Colchester are Colenceaster and Colneceastre from the 10th century, with the modern spelling of Colchester being found in the 15th century. In this way of interpreting the name, the River Colne which runs through the city takes its name from Colonia as well. Cologne (German Köln) also gained its name from a similar etymology (from its Roman name Colonia Claudia Ara Agrippinensium).\n
Other etymologists are confident that the Colne’s name is of Celtic (pre-Roman) origin, sharing its origin with several other rivers Colne or Clun around Britain, and that Colchester is derived from Colne and Castra. Ekwall went as far as to say "it has often been held that Colchester contains as first element colonia … this derivation is ruled out of court by the fact that Colne is the name of several old villages situated a good many miles from Colchester and on the Colne. The identification of Colonia with Colchester is doubtful."\n
The popular association of the name with King Coel has no academic merit.\n