Searching 'products' using natural language querying,

I’m trying to use OpenAI APIs to search in a relatively big CSV file. the file includes some products and the CSV is structured like this:

id, name, color, material, price

and some sample data:

1, teddy bear, brown, polyester, 10

2, panda, black and white, cotton, 20

3, giraffe, yellow, plush,30

I want my users to be able to search for “The Most expensive item”, “Brown bear made out of synthetic material”, “any toy that is not cotton”, “biggest toy”, “toys for 3-year-old” or “toys for boys”

It’s not possible to use traditional tabular/document storage to query these things without having real-world data. e.g. gender bias on toys.

  1. I tried to use embedded-ada, but something like “most expensive” or “not cotton” doesn’t work with it since vector rating cannot understand the context.

  2. I also tried using Completion. it works with a small sample set, but I need to provide the whole list every time, which is not practical given the token limits and also the price.

  3. I tried fine tuning davinci, by providing the description of the product as the prompt and the id as result. but I received gibberish results when I tried to use it e.g. 1818 product category12 - 433065 - The178055 - 4596528622468 and so on … non of these numbers exist in my dataset.

What I did do wrong, is there any other way?

2 Likes

Why not store in a DB and use GPT-3 to translate the query into SQL.

There’s even an example

4 Likes

I don’t think you can query using SQL something like “Toys for a 3-year-old boy” and expect it to return a car toy instead of a barbie (gender bias) or something like “Super heroes” and expect it to return a batman figure.

That’s exactly the purpose of a relational database.
You can easily extract important information using simple logic.
You can also use entity extraction to convert a sentence into a database query. Keep in mind that this method won’t be perfect and will need continual training. I imagine you could tie the two together for further training/testing

“Toys for 3-year old boy” → Extract(Age, Gender, Etc.) → select_from_database(age=3, gender=boy)

1 Like

extracting information using the APIs and using them to query the database is a good idea :+1: I gave it a try and it looks promising.

But as you said it’s not reliable, for example, returning “Batman” from the prompt “superhero” is not straightforward.

look at this sample. I’m looking to have something like this, but at scale with a lot of inputs

or this output from Bing:

1 Like

The model won’t return Batman, it will return a query.

This problem is more of planning the structure your DB than consuming the OpenAI API.

e.g. You could have an attribute category which would contain the entry superhero for the respective toys that fall in the category.

The model will generate a query, which will return all the toys with category == "superhero"

2 Likes

As sps says (Happy birthday!), it’s a matter of structuring your database and creating a pipeline to manage each separate function.

Here’s a thought for your situation:
Instead of querying GPT as a database, why not use GPT to create tags for each product? You can then store the tags in your database, and also perform some nice analytics as well. I asked cGPT and this was its answer:

Create tags that relate the following item to an item store.

Item: Batman action figure

Tags:
[RESP] Batman, action figure, superhero, DC Comics, merchandise, collectibles, toy store, comic book store, pop culture.

For some reason my line separator doesn’t appear here. Another great aspect for this idea is that you would only ever need to query GPT once for each item, instead of each time someone searches something.

4 Likes

Thanks! And yes it’s great idea to generate tags for each product. The goal is to solve the problem not consume the API.

Extracting tags using GPT is a great idea. probably along running some filters to remove repetitive tags and reject some locally, it’s a very useful way to make the search better cheaply :+1::+1:

It seems that my idea of searching in a large set of unstructured data is not achievable with the current APIs. at least cheaply.

1 Like