Searching 'products' using natural language querying,

pedr0 · February 26, 2023, 10:40am

I’m trying to use OpenAI APIs to search in a relatively big CSV file. the file includes some products and the CSV is structured like this:

id, name, color, material, price

and some sample data:

1, teddy bear, brown, polyester, 10

2, panda, black and white, cotton, 20

3, giraffe, yellow, plush,30

I want my users to be able to search for “The Most expensive item”, “Brown bear made out of synthetic material”, “any toy that is not cotton”, “biggest toy”, “toys for 3-year-old” or “toys for boys”

It’s not possible to use traditional tabular/document storage to query these things without having real-world data. e.g. gender bias on toys.

I tried to use embedded-ada, but something like “most expensive” or “not cotton” doesn’t work with it since vector rating cannot understand the context.
I also tried using Completion. it works with a small sample set, but I need to provide the whole list every time, which is not practical given the token limits and also the price.
I tried fine tuning davinci, by providing the description of the product as the prompt and the id as result. but I received gibberish results when I tried to use it e.g. 1818 product category12 - 433065 - The178055 - 4596528622468 and so on … non of these numbers exist in my dataset.

What I did do wrong, is there any other way?

sps · February 26, 2023, 3:15pm

Why not store in a DB and use GPT-3 to translate the query into SQL.

There’s even an example

pedr0 · February 26, 2023, 7:15pm

I don’t think you can query using SQL something like “Toys for a 3-year-old boy” and expect it to return a car toy instead of a barbie (gender bias) or something like “Super heroes” and expect it to return a batman figure.

RonaldGRuckus · February 26, 2023, 7:22pm

That’s exactly the purpose of a relational database.
You can easily extract important information using simple logic.
You can also use entity extraction to convert a sentence into a database query. Keep in mind that this method won’t be perfect and will need continual training. I imagine you could tie the two together for further training/testing

“Toys for 3-year old boy” → Extract(Age, Gender, Etc.) → select_from_database(age=3, gender=boy)

pedr0 · February 26, 2023, 7:38pm

extracting information using the APIs and using them to query the database is a good idea I gave it a try and it looks promising.

But as you said it’s not reliable, for example, returning “Batman” from the prompt “superhero” is not straightforward.

look at this sample. I’m looking to have something like this, but at scale with a lot of inputs

or this output from Bing:

sps · February 26, 2023, 7:44pm

The model won’t return Batman, it will return a query.

This problem is more of planning the structure your DB than consuming the OpenAI API.

e.g. You could have an attribute category which would contain the entry superhero for the respective toys that fall in the category.

The model will generate a query, which will return all the toys with category == "superhero"

RonaldGRuckus · February 26, 2023, 7:48pm

As sps says (Happy birthday!), it’s a matter of structuring your database and creating a pipeline to manage each separate function.

Here’s a thought for your situation:
Instead of querying GPT as a database, why not use GPT to create tags for each product? You can then store the tags in your database, and also perform some nice analytics as well. I asked cGPT and this was its answer:

Create tags that relate the following item to an item store.

Item: Batman action figure

Tags:
[RESP] Batman, action figure, superhero, DC Comics, merchandise, collectibles, toy store, comic book store, pop culture.

For some reason my line separator doesn’t appear here. Another great aspect for this idea is that you would only ever need to query GPT once for each item, instead of each time someone searches something.

sps · February 26, 2023, 7:53pm

Thanks! And yes it’s great idea to generate tags for each product. The goal is to solve the problem not consume the API.

pedr0 · February 26, 2023, 8:19pm

Extracting tags using GPT is a great idea. probably along running some filters to remove repetitive tags and reject some locally, it’s a very useful way to make the search better cheaply

It seems that my idea of searching in a large set of unstructured data is not achievable with the current APIs. at least cheaply.

Topic		Replies	Views
Using OpenAI to search database for products API	12	5185	November 21, 2023
Using GPT to Search & Pull Recommendations from a Database? API	23	9611	August 22, 2024
AI Search using big ammount Data without VECTOR Prompting chatgpt , assistants-api	3	145	November 27, 2024
About the usage of ChatGPT Embedding API	9	4287	August 18, 2023
Using GPT to build a Recommender System — Looking for Help API plugin-development , api	9	11300	December 19, 2023

Searching 'products' using natural language querying,

Related topics