By using OpenAI Chat Completion Python API, I am trying to create a summary of article titles for a newsfeed.
Let say I have 100 ‘Basketball’ news and I want to generate a summary by using these news article titles.
My prompt is like this:
<Act as a journalist and summarise the most important ‘Sports’ news related to ‘Basketball’, in 150 to 200 words based on the article titles below delimited by triple quotes.
“”"
some basketball news
some transfer
a new saloon is built
so on…
“”">
However, it is not able to summarise sometimes, although there are quite a lot of information related to ‘Basketball’ and it answers back as ‘There is no most important Sports news related to Basketball in the provided articles’
Where is it supposed to get that summary? You want it to imagine? Hallucinate?
Then you have ambiguous classification phrasing.
I can just ask ChatGPT (using copypasta input that ChatGPT has sanitized already for me), and here it doesn’t go into the realm of pure fiction to create the summary.
It also would need more information for many cases of “based on the article titles”: it might not know what sport a drafted player represents (but got an ambiguous one of mine correct)
The code is something like this (might not be syntactically correct). Despite, I am sharing the prompt without changing:
def generate(self, prompt: str, news: list) -> str:
prompt = f'Summarise in 150 to 200 words the most important "Industry" news related to topic
"Electronical Fraud", based on articles below delimited by triple quotes.
Do not generate a summary in list format. Create paragraphs instead.
Only provide the generated summary as the response, do not start your responses with an introductory sentence.
If there is no important "Industry" news related to the topic "Electronical Fraud" in the given articles,
please provide only the word "EMPTY" as your response.
\"\"\"
- Obasanjo,IBB Among 25 World’s Richest Politicians
- The Latest: Trump expected to shift toward economy
- Waking Up In Hillary Clinton’s America
- Before Trump, Sen Bulworth Spoke Truth To Power
- Fox stars privately bashed election fraud claims the network pushed
- Rupert Murdoch says some Fox News anchors ‘endorsed’ false election fraud claims in Dominion case
- Republicans Exit ERIC, Citing Voter Fraud Potential
\"\"\"
'
try:
messages = [{'role': 'system', 'content': 'Act like a journalist and generate a complete answer in 150 to 200 words.'},
{'role': 'user', 'content': prompt}]
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=messages,
temperature=0.8,
max_tokens=300)
result = completion.choices[0].message.content
return '' if result == 'EMPTY' else result
except openai.OpenAIError as e:
logger.error(f'Error occurred while initializing OpenAI API: {e}')
return None
But, about this, you are right indeed. I am aware I am asking to create a summary from titles, which doesn’t carry enough information.
However, it is impossible to predict if I will have a response or not. For the same article titles, sometimes it returns a very valid summary, sometimes I get nothing.
The unreliable response is likely your mishmash of instructions between system and user roles. The needless triple-quotes that don’t delineate the actual headlines as you say they should.
I would articulate clearly in the system role that it is going to operate on user-provided data in the particular way, and only provide the data in the user message.
Chat models should be reliable when the message containers are filled. I just had a babbage case where it returns nothing for a particular input without putting some more carriage returns after the single unlabeled input for completion (and then acted more like an insertions model).
About the system and user messages, do you suggest something like this?:
System: You will be given some news article titles. Act like a journalist and summarise them in 150 to 200 words. Do not generate a summary in list format. Create paragraphs instead. Only provide the generated summary as the response, do not start your responses with an introductory sentence. If you cannot generate a summary, output the word ‘EMPTY’ as your response.
*User: * - title 1 - title 2 - so on…
About the EMPTY… I have asked this question before and when you instruct with respond with empty string, do not respond with anything etc… it just doesn’t follow. This suggestion worked pretty well so far.
Also, about the system role, if you have checked/followed other discussions in the platform, it is ‘weaker’ than user input, and currently do not behave as intended.
That’s the idea. Like code, we have instruction and then we have data.
The performance of the AI can be improved more by telling it exactly what it is actually doing instead of “act like a journalist”. Without fully understanding where you are using this programmatically, we can assume, and write an example system instruction better:
You are the AI processor for my website, which is a news aggregator and only shows basketball stories extracted from a general news feed data source. Additionally, while the news feed only gives headlines written in a terse style, we want to provide the reader with an expanded summary that uses natural language. For each headline that is provided in a list of headlines in the user role message, you will only act on the headline if the language strongly indicates the category is basketball. Then, for each meeting this basketball topic criteria, you will then synthesize and output a paragraph-length summary of the news story, with each story separated by carriage returns. If no story meets the criteria, you will indicate this to backend code by only outputting the text “no basketball news”.
(other code output specifications, other instructions about length and how many stories to consider…)
just as a thing to try not sure about your single ’ f line, try this
prompt = f'''Summarise in 150 to 200 words the most important "Industry" news related to topic
"Electronical Fraud", based on articles below delimited by triple quotes.
Do not generate a summary in list format. Create paragraphs instead.
Only provide the generated summary as the response, do not start your responses with an introductory sentence.
If there is no important "Industry" news related to the topic "Electronical Fraud" in the given articles,
please provide only the word "EMPTY" as your response.
\"\"\"
- Obasanjo,IBB Among 25 World’s Richest Politicians
- The Latest: Trump expected to shift toward economy
- Waking Up In Hillary Clinton’s America
- Before Trump, Sen Bulworth Spoke Truth To Power
- Fox stars privately bashed election fraud claims the network pushed
- Rupert Murdoch says some Fox News anchors ‘endorsed’ false election fraud claims in Dominion case
- Republicans Exit ERIC, Citing Voter Fraud Potential
\"\"\"
'''
Back in march it would have just acted as a journalist. now you pretty much have to tell it how a journalist acts in great detail. this is not a problem with the users prompt this is due to the deliberate degradation of performance due to unknown reasons
How to select the most newsworthy stories for a basketball fan newsletter:
Each week we mail out the most interesting and newsworthy stories to our subscribers. We select only those stories which will be most interesting and enjoyable for basketball fans.
We usually select 5-8 stories from an original list of 20 news stories from the world of basketball.
Original list:
1. Spain announces extended roster for FIBA Basketball World Cup 2023 2. Tom Segura basketball slam dunk injury video resurfaces in wake of Netflix special ‘Sledgehammer’ 3. Doncic leads Slovenia’s extended roster for FIBA Basketball World Cup 4. NBA announces new COVID-19 safety protocols for players and staff 5. LeBron James surpasses Kareem Abdul-Jabbar’s all-time scoring record 6. High school basketball player breaks state record for most points in a single game 7. WNBA announces new pay structure and benefits for players 8. NBA legend Michael Jordan invests in eSports company 9. New basketball training facility opens in Los Angeles 10. NBA player’s controversial tweet sparks debate about mental health in sports 11. College basketball team disqualified from tournament due to COVID-19 outbreak 12. NBA All-Star game to be held without fans due to pandemic 13. NBA player’s shoe line sells out in minutes 14. Basketball coach fired for inappropriate behavior 15. NBA team trades star player in surprising move 16. Basketball player comes out as first openly gay in NBA history 17. NBA player’s charity donates $1 million to local schools 18. NBA team wins championship after dramatic final game 19. NBA player retires after 20-year career 20. Basketball player’s autobiography becomes bestseller
Selected stories:
Spain announces extended roster for FIBA Basketball World Cup 2023 - This story is relevant for fans who follow international basketball and are interested in the upcoming World Cup.
Doncic leads Slovenia’s extended roster for FIBA Basketball World Cup - This story is interesting because it features a popular NBA player, Luka Doncic, who will be playing for his home country in the World Cup.
LeBron James surpasses Kareem Abdul-Jabbar’s all-time scoring record - This is a significant milestone in basketball history and will be of interest to all basketball fans.
NBA team trades star player in surprising move - This story will be of interest to fans who follow the NBA closely and are interested in player trades and team strategies.
NBA team wins championship after dramatic final game - This is a major event in the basketball world and will be of interest to all fans.
Basketball player comes out as first openly gay in NBA history - This is a significant moment in sports history and will be of interest to fans who support diversity and inclusion in sports.
NBA player’s charity donates $1 million to local schools - This story highlights the positive impact that basketball players can have off the court, which can be inspiring for fans.
Basketball player’s autobiography becomes bestseller - This story will be of interest to fans who enjoy learning more about the personal lives and careers of basketball players.