Terminology extractor critique / suggestions

juliushamilton100 · October 18, 2021, 4:55pm

I am trying to design a terminology extractor with GPT-3 and I am following the template in the documentation for a tweet classifier at the moment. That means the engine is davinci, the response length 64, the temperature 0.2, the Top P 1, the stop sequence “two enter keys”, and a highly structured prompt which involves a natural language instruction and about 9 examples with 5 sentences for completion.

This is my prompt, which copies the tweet template completely:

This is a terminology extractor.

Sentence: “The rocketship is fueled with Alpha-Omega Jet Fuel.”
Keywords: Alpha-Omega Jet Fuel

Sentence: Barack Obama will visit Japan on Memorial Day.
Keywords: Memorial Day

Sentence: The metatarsals are attached to the cuneiform bones at a diagonal angle.
Keywords: metatarsal, cuneiform

Sentence: The HTTP protocol is used for sending and receiving HTML requests over the internet.
Keywords: HTTP protocol, HTML request

Sentences:

Happy Birthday John! I got you this memory-foam mattress for the occasion.

The band Green Day usually uses Vox foot pedals with heavy reverb and oscillatory delay.

The Amazon rainforest has been heavily damaged by ammonium acetate in the form of acid rain.

The young boy’s occlusion was malaligned and he was in dire need of a surgical implant.

It was the happiest day in the world, except for a V1 Tesla Roadster which had crashed into an embankment.

Keywords:

memory-foam

Vox foot pedals, oscillatory delay

ammonium acetate

occlusion

V1 Tesla Roadster

Sentences:

After all the Shizuku Miyatsu that he had practiced, he had become serene.

The left tetrometer was fully engaged at the time of the incident.

She underwent a tetralogy of fallot surgery early in life.

The corneal gland is inflamed due to lack of lipids.

A gristmill mills grain with the assistance of vanes which are called blades.

Keywords:
1.

Unfortunately, GPT-3 returned the stop sequence in the playground, so it will need at least a little adjusting.

I can think of a few adjustments.

I can increase the temperature.

I can eliminate the stop sequence.

I can provide higher-quality data. These examples are made up, but I can provide real-world sentences with actual terminology to be extracted.

I can provide even more examples.

I will continue to tinker with this, but in case anybody wants to collaborate, I’d be interested in hearing your suggestions.

Thanks very much.

juliushamilton100 · October 18, 2021, 5:28pm

Here’s a brief update for anyone interested in knowing how to do this or collaborating in making it work:

I decided to use @daveshapautomator 's template for a keyword extractor.

A first draft was unsuccessful possibly because of a small amount of source text, so I posted the introduction to a Wikipedia article and provided about 8 examples of terminology:

Extract terminology from the following passage:

Passage:

In mathematics, topology (from the Greek words τόπος, ‘place, location’, and λόγος, ‘study’) is concerned with the properties of a geometric object that are preserved under continuous deformations, such as stretching, twisting, crumpling, and bending; that is, without closing holes, opening holes, tearing, gluing, or passing through itself.

A topological space is a set endowed with a structure, called a topology, which allows defining continuous deformation of subspaces, and, more generally, all kinds of continuity. Euclidean spaces, and, more generally, metric spaces are examples of a topological space, as any distance or metric defines a topology. The deformations that are considered in topology are homeomorphisms and homotopies. A property that is invariant under such deformations is a topological property. Basic examples of topological properties are: the dimension, which allows distinguishing between a line and a surface; compactness, which allows distinguishing between a line and a circle; connectedness, which allows distinguishing a circle from two non-intersecting circles.

The ideas underlying topology go back to Gottfried Leibniz, who in the 17th century envisioned the geometria situs and analysis situs. Leonhard Euler’s Seven Bridges of Königsberg problem and polyhedron formula are arguably the field’s first theorems. The term topology was introduced by Johann Benedict Listing in the 19th century, although it was not until the first decades of the 20th century that the idea of a topological space was developed.

Terminology:

topology

geometric object

continuous deformation

crumpling

gluing

topological space

subspace

continuity

The engine is DaVinci, the temperature is 0, the response length is 100, and the stop sequence is “enter enter”.

The response is good enough for this tool to probably have some kind of applicability:

homeomorphism

homotopy

dimension

compactness

connectedness

I was happy to find that GPT-3 produces this response consistently, giving me a feeling that the tool is predictable and reliable.

My only sense of it needing more is that I think there is more good terminology there to be extracted.

Next, I’m going to try a longer passage and writing more terminology examples relative to the passage, to see if I can get GPT-3 to return more terminology itself.

daveshapautomator · October 18, 2021, 5:49pm

You can also try to use different TOP_P as well as adjectives to describe the kind of terminology. So for instance you could specify “academic jargon” or “named entities” or “rare terms”. Remember that GPT-3 has a bigger vocabulary than any single human so you can use lots of adjectives.

juliushamilton100 · October 19, 2021, 6:29am

Is there any known reason why TOP_P might have a particular effect vs. temperature or is it just up to chance to see what the result is?

daveshapautomator · October 19, 2021, 11:23am

TOP_P changes the distribution of tokens that it is willing to try. Higher TOP_P will give it access to all tokens and thus will allow for more rare terminology to be extracted. For instance, you might have an exceptionally rare term that you want to find, meaning you need a TOP_P of 1.0. If, however, you are looking only for mundane terms, you might want to turn it down to 0.1 or 0.05.

juliushamilton100 · October 20, 2021, 8:47am

I see.
So TOP_P changes the “distribution” of tokens it will try.
So we can think of tokens as “possible responses”. How specifically does the “distribution” change? It just allows more tokens that are graded lower statistically as being a likely candidate?
But how will that interact with temperature?
For example, could we say that TOP_P changes the choices it has, whereas temperature changes the choice it will make given those options? (Risky or conservative.)
In that case it sounds like low temperature with high TOP_P could make sense - extract unusual terms but select them with rigor and definiteness (I mean, this is a vague approximation of how GPT-3 may or may not work / think.)
On the other hand, maybe for a QA bot, we maybe could imagine wanting low TOP_P, low temperature if we wanted it to go for the most conventional answer possible; and it would be interesting to compare low TOP_P with high temperature vs low temperature with high TOP_P. Maybe the former would be accurate but expressive, the latter more adventurously creative and insightful for an objective informational question yet less certain to be correct.

I don’t know if this is right but I’m trying to get a sense for what these parameters mean.

I just tried asking GPT-3 “Who is the character Legolas?” with both low and then one or the other high. If TOP_P is low, temperature doesn’t change the result. I assume this means it is given so few choices a risky or a conservative chooser choose the same thing. (My guess.)

Topic		Replies	Views
GPT-3 for terminology extraction API	6	1354	October 18, 2021
A better explanation of "Top P"? Prompting	10	132976	December 12, 2023
Use GPT-3 to generate GPT-3 prompts Prompting	14	3343	December 17, 2023
GPT-3: Abstract Reasoning questions Prompting	6	3528	October 3, 2021
ChatGPT gives much better response than API GPT 3.5 Turbo API	4	2374	May 16, 2023

Terminology extractor critique / suggestions

Related topics