Need help identifying products in sentences using OpenAI API

I’m facing a challenge in my project where I need to determine if a given sentence contains products or not. I’ve tried using the OpenAI API , but unfortunately, I’m not getting accurate results.
Here are some examples of the sentences I’m dealing with:

Sign in to your account
Find & reorder past purchases
New to Amazon.in? Create an account
Up to 50% off | Skincare and haircare essentials redefine beauty
Enjoy all the benefits of Prime
Air conditioners
Up to 50% off
Fruits
See more
Explore all
Up to 40% off
iPhone 13
Up to 26% off
Up to 53% off

I want to be able to identify if any of these sentences contain references to products. As mentioned, I’m new to NLP, and I would greatly appreciate any guidance or suggestions on how to approach this problem effectively.

Here is the code I have used

def contains_product_name(sentence, threshold=0.7):
    openai.api_key = 'api-key' 
    
    prompt = f"Determine if the following sentence contains a product name:\nText: {sentence}\nProduct Name:"
    
    response = openai.Completion.create(
        engine="gpt-3.5-turbo-instruct",
        prompt=prompt,
        max_tokens=30 
    )
    
    generated_text = response.choices[0].text.strip()

    similarity_score = response.choices[0].score
    if similarity_score >= threshold:
        return True
    else:
        return False

# Example usage
sentence = "Up to 68% off"
contains_product = contains_product_name(sentence)
if contains_product:
    print("Yes, the sentence contains a product name.")
else:
    print("No, the sentence does not contain a product name.")

Hi @goelabhishk - Personally, I think this would be a good use case for fine-tuning.

You can read up here on fine-tuning:

https://platform.openai.com/docs/guides/fine-tuning
https://platform.openai.com/docs/api-reference/fine-tuning

Let me know if you have any further specific questions.

Thank you so much. This is looking very helpful. I will try this and let you know.

1 Like

And just to throw it in the mix: you could technically also use embeddings. That said, if you are new to NLP and not yet familiar with embeddings, then the fine-tuned model is likely the more straightforward approach for the time being.

1 Like

Trying out a new way to help users in the forum by recording a video on how I would solve this problem @goelabhishk

Hope this helps.

https://www.youtube.com/watch?v=VdpneLeBkaE

1 Like

OpenAI used to have a Classifications endpoint, but it was Deprecated. However their Transition documentation talks about using fine tuning and embedding as altertives, like other people in the thread suggested

2 Likes

The type of task you are specifying is called entity extraction.

You are looking for a particular type of object or phrase within text.

This is how it can be done with a chat completions API call:

Here is example code to make a chat completions request, where I have made a function to also clean up the junk that gpt-3.5-turbo-0125 will put out instead of valid JSON.

import json
import re
from openai import OpenAI
client = OpenAI()

my_text = """
Today you can purchase the Retro Encabulator for only $19.99!
""".strip()

response = client.chat.completions.create(
  model="gpt-3.5-turbo",
  messages=[
    {
      "role": "system",
      "content": """
You are an entity extractor.
You identify if the text contains named products that a company makes that are the type of items that can be purchased.

// response
- only valid JSON is produced, double quotes around keys and strings.
- no markdown allowed (e.g.``` is prohibited).
- format: JSON, with the following keys:
-- key: "contains_products": boolean,
-- key: "extracted_products": array of strings.
""".strip()
    },
    {
      "role": "user",
      "content": f"Extract products:\n\n{my_text}"
    }
  ],
  temperature=0.5,
  max_tokens=256,
  top_p=0.5,
  frequency_penalty=0,
  presence_penalty=0
)
response_str = response.choices[0].message.content

def parse_json_string(input_str):
    # First, try to load the string as is to check if it's already a valid JSON
    try:
        return json.loads(input_str)  # Return the Python data object directly
    except json.JSONDecodeError:
        # If an exception is thrown, attempt to clean the input
        # This regex looks for the JSON object starting with '{' and ending with '}'
        # and ignores anything outside of it
        match = re.search(r'{.*}', input_str, re.DOTALL)
        if match:
            cleaned_str = match.group(0)
            return json.loads(cleaned_str)  # Return the Python data object after cleaning
        else:
            raise ValueError("Input does not contain a valid JSON object.")

response_data = parse_json_string(response_str)

# Printing a natural language report
if response_data.get("contains_products"):  # This checks for True explicitly
    products = response_data.get("extracted_products", [])
    if products:  # Check if the list is not empty
        print(f"Products were found! Here's a listing:\n- " + "\n- ".join(products))
    else:
        print("Products are indicated as found, but the list is empty.")
else:
    print("No products were found in the text.")
print(f"\n---\nAPI said:\n{response_data}")

Response printout:

Products were found! Here’s a listing:

  • retro encabulator

API said:
{‘contains_products’: True, ‘extracted_products’: [‘retro encabulator’]}

I hope that my one-on-one consulting to develop code that exactly meets your needs was a satisfactory experience.

2 Likes

You video was really very helpful. Thanks a lot.

1 Like

Really helpful answer. Thanks for the reply.

Can you help me with some article or blog that help me learn embeddings?

https://platform.openai.com/docs/guides/embeddings

2 Likes

Appreciate it. I will try and give back more to the community here through videos!

1 Like

Hi, Sorry to bother you. I Have made a function to identify product name and its category. But i want to run that on 1.5 lakh items but after 700 items i got an error -

RateLimitError: Error code: 429 - {‘error’: {‘message’: ‘Rate limit reached for gpt-3.5-turbo in organization org-4yoElpWysEHGS0NRIdabJR2l on tokens per min (TPM): Limit 60000, Used 59716, Requested 801. Please try again in 517ms. Visit https://platform.openai.com/account/rate-limits to learn more.’, ‘type’: ‘tokens’, ‘param’: None, ‘code’: ‘rate_limit_exceeded’}}

What should I do? How to avoid that? can any body help me in this?

@idonotwritecode @Diet

Hey, so you just hit your rate limit - that’s all.

You can space put your API calls a bit wider apart.

Further, as you consume more resources, your Tier goes up as below, increasing your rate limit:

Other ways to combat this:
a) write a rule that if one API fails, use another API (from another OpenAI account)
b) do multiple items in a single call - you could potentially do about 100 to 1000 products in 1 API (GPT 4)

1 Like

I can also create multiple api’s from same account. Is that work?

No. But you can switch the API model within your account.

But you can create another OpenAI account, and use the API key as backup.

Actually I have the data in excel so I am using pandas to read text of one column and save that output into another column. How should I use 1000 items in one go and match its output in dataframe?