The type of task you are specifying is called entity extraction.
You are looking for a particular type of object or phrase within text.
This is how it can be done with a chat completions API call:
Here is example code to make a chat completions request, where I have made a function to also clean up the junk that gpt-3.5-turbo-0125 will put out instead of valid JSON.
import json
import re
from openai import OpenAI
client = OpenAI()
my_text = """
Today you can purchase the Retro Encabulator for only $19.99!
""".strip()
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": """
You are an entity extractor.
You identify if the text contains named products that a company makes that are the type of items that can be purchased.
// response
- only valid JSON is produced, double quotes around keys and strings.
- no markdown allowed (e.g.``` is prohibited).
- format: JSON, with the following keys:
-- key: "contains_products": boolean,
-- key: "extracted_products": array of strings.
""".strip()
},
{
"role": "user",
"content": f"Extract products:\n\n{my_text}"
}
],
temperature=0.5,
max_tokens=256,
top_p=0.5,
frequency_penalty=0,
presence_penalty=0
)
response_str = response.choices[0].message.content
def parse_json_string(input_str):
# First, try to load the string as is to check if it's already a valid JSON
try:
return json.loads(input_str) # Return the Python data object directly
except json.JSONDecodeError:
# If an exception is thrown, attempt to clean the input
# This regex looks for the JSON object starting with '{' and ending with '}'
# and ignores anything outside of it
match = re.search(r'{.*}', input_str, re.DOTALL)
if match:
cleaned_str = match.group(0)
return json.loads(cleaned_str) # Return the Python data object after cleaning
else:
raise ValueError("Input does not contain a valid JSON object.")
response_data = parse_json_string(response_str)
# Printing a natural language report
if response_data.get("contains_products"): # This checks for True explicitly
products = response_data.get("extracted_products", [])
if products: # Check if the list is not empty
print(f"Products were found! Here's a listing:\n- " + "\n- ".join(products))
else:
print("Products are indicated as found, but the list is empty.")
else:
print("No products were found in the text.")
print(f"\n---\nAPI said:\n{response_data}")
Response printout:
Products were found! Here’s a listing:
- retro encabulator
API said:
{‘contains_products’: True, ‘extracted_products’: [‘retro encabulator’]}
I hope that my one-on-one consulting to develop code that exactly meets your needs was a satisfactory experience.
