Need help identifying products in sentences using OpenAI API

The type of task you are specifying is called entity extraction.

You are looking for a particular type of object or phrase within text.

This is how it can be done with a chat completions API call:

Here is example code to make a chat completions request, where I have made a function to also clean up the junk that gpt-3.5-turbo-0125 will put out instead of valid JSON.

import json
import re
from openai import OpenAI
client = OpenAI()

my_text = """
Today you can purchase the Retro Encabulator for only $19.99!
""".strip()

response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {
      "role": "system",
      "content": """
You are an entity extractor.
You identify if the text contains named products that a company makes that are the type of items that can be purchased.

// response
- only valid JSON is produced, double quotes around keys and strings.
- no markdown allowed (e.g.``` is prohibited).
- format: JSON, with the following keys:
-- key: "contains_products": boolean,
-- key: "extracted_products": array of strings.
""".strip()
    },
    {
      "role": "user",
      "content": f"Extract products:\n\n{my_text}"
    }
  ],
  temperature=0.5,
  max_tokens=256,
  top_p=0.5,
  frequency_penalty=0,
  presence_penalty=0
)
response_str = response.choices[0].message.content

def parse_json_string(input_str):
    # First, try to load the string as is to check if it's already a valid JSON
    try:
        return json.loads(input_str)  # Return the Python data object directly
    except json.JSONDecodeError:
        # If an exception is thrown, attempt to clean the input
        # This regex looks for the JSON object starting with '{' and ending with '}'
        # and ignores anything outside of it
        match = re.search(r'{.*}', input_str, re.DOTALL)
        if match:
            cleaned_str = match.group(0)
            return json.loads(cleaned_str)  # Return the Python data object after cleaning
        else:
            raise ValueError("Input does not contain a valid JSON object.")

response_data = parse_json_string(response_str)

# Printing a natural language report
if response_data.get("contains_products"):  # This checks for True explicitly
    products = response_data.get("extracted_products", [])
    if products:  # Check if the list is not empty
        print(f"Products were found! Here's a listing:\n- " + "\n- ".join(products))
    else:
        print("Products are indicated as found, but the list is empty.")
else:
    print("No products were found in the text.")
print(f"\n---\nAPI said:\n{response_data}")

Response printout:

Products were found! Here’s a listing:

  • retro encabulator

API said:
{‘contains_products’: True, ‘extracted_products’: [‘retro encabulator’]}

I hope that my one-on-one consulting to develop code that exactly meets your needs was a satisfactory experience.

2 Likes