I’m facing a challenge in my project where I need to determine if a given sentence contains products or not. I’ve tried using the OpenAI API , but unfortunately, I’m not getting accurate results.
Here are some examples of the sentences I’m dealing with:
Sign in to your account
Find & reorder past purchases
New to Amazon.in? Create an account
Up to 50% off | Skincare and haircare essentials redefine beauty
Enjoy all the benefits of Prime
Air conditioners
Up to 50% off
Fruits
See more
Explore all
Up to 40% off
iPhone 13
Up to 26% off
Up to 53% off
I want to be able to identify if any of these sentences contain references to products. As mentioned, I’m new to NLP, and I would greatly appreciate any guidance or suggestions on how to approach this problem effectively.
Here is the code I have used
def contains_product_name(sentence, threshold=0.7):
openai.api_key = 'api-key'
prompt = f"Determine if the following sentence contains a product name:\nText: {sentence}\nProduct Name:"
response = openai.Completion.create(
engine="gpt-3.5-turbo-instruct",
prompt=prompt,
max_tokens=30
)
generated_text = response.choices[0].text.strip()
similarity_score = response.choices[0].score
if similarity_score >= threshold:
return True
else:
return False
# Example usage
sentence = "Up to 68% off"
contains_product = contains_product_name(sentence)
if contains_product:
print("Yes, the sentence contains a product name.")
else:
print("No, the sentence does not contain a product name.")
And just to throw it in the mix: you could technically also use embeddings. That said, if you are new to NLP and not yet familiar with embeddings, then the fine-tuned model is likely the more straightforward approach for the time being.
OpenAI used to have a Classifications endpoint, but it was Deprecated. However their Transition documentation talks about using fine tuning and embedding as altertives, like other people in the thread suggested
Here is example code to make a chat completions request, where I have made a function to also clean up the junk that gpt-3.5-turbo-0125 will put out instead of valid JSON.
import json
import re
from openai import OpenAI
client = OpenAI()
my_text = """
Today you can purchase the Retro Encabulator for only $19.99!
""".strip()
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": """
You are an entity extractor.
You identify if the text contains named products that a company makes that are the type of items that can be purchased.
// response
- only valid JSON is produced, double quotes around keys and strings.
- no markdown allowed (e.g.``` is prohibited).
- format: JSON, with the following keys:
-- key: "contains_products": boolean,
-- key: "extracted_products": array of strings.
""".strip()
},
{
"role": "user",
"content": f"Extract products:\n\n{my_text}"
}
],
temperature=0.5,
max_tokens=256,
top_p=0.5,
frequency_penalty=0,
presence_penalty=0
)
response_str = response.choices[0].message.content
def parse_json_string(input_str):
# First, try to load the string as is to check if it's already a valid JSON
try:
return json.loads(input_str) # Return the Python data object directly
except json.JSONDecodeError:
# If an exception is thrown, attempt to clean the input
# This regex looks for the JSON object starting with '{' and ending with '}'
# and ignores anything outside of it
match = re.search(r'{.*}', input_str, re.DOTALL)
if match:
cleaned_str = match.group(0)
return json.loads(cleaned_str) # Return the Python data object after cleaning
else:
raise ValueError("Input does not contain a valid JSON object.")
response_data = parse_json_string(response_str)
# Printing a natural language report
if response_data.get("contains_products"): # This checks for True explicitly
products = response_data.get("extracted_products", [])
if products: # Check if the list is not empty
print(f"Products were found! Here's a listing:\n- " + "\n- ".join(products))
else:
print("Products are indicated as found, but the list is empty.")
else:
print("No products were found in the text.")
print(f"\n---\nAPI said:\n{response_data}")
Response printout:
Products were found! Here’s a listing:
retro encabulator
API said:
{‘contains_products’: True, ‘extracted_products’: [‘retro encabulator’]}
I hope that my one-on-one consulting to develop code that exactly meets your needs was a satisfactory experience.
Hi, Sorry to bother you. I Have made a function to identify product name and its category. But i want to run that on 1.5 lakh items but after 700 items i got an error -
RateLimitError: Error code: 429 - {‘error’: {‘message’: ‘Rate limit reached for gpt-3.5-turbo in organization org-4yoElpWysEHGS0NRIdabJR2l on tokens per min (TPM): Limit 60000, Used 59716, Requested 801. Please try again in 517ms. Visit https://platform.openai.com/account/rate-limits to learn more.’, ‘type’: ‘tokens’, ‘param’: None, ‘code’: ‘rate_limit_exceeded’}}
What should I do? How to avoid that? can any body help me in this?
Other ways to combat this:
a) write a rule that if one API fails, use another API (from another OpenAI account)
b) do multiple items in a single call - you could potentially do about 100 to 1000 products in 1 API (GPT 4)
Actually I have the data in excel so I am using pandas to read text of one column and save that output into another column. How should I use 1000 items in one go and match its output in dataframe?