Seeking Advice on Handling Large Vehicle Database for AI Chatbot Application

Our client requires an AI chatbot that can handle and answer questions about 250 vehicles in their inventory. However, due to token limitations, our current system can only manage information for up to 40 vehicles. Seeking suggestions on how to overcome this challenge and efficiently implement a larger database within our AI chatbot.

Do you mind providing more details about the use case? For example, what the vehicle data look like, and what kind of questions you plan to handle.

1 Like

I have information on 250 vehicles, similar to the example below:

Brand: Volkswagen
Trim level: 1.4 Tsi Bmt Highline Dsg 125HP
Model: Golf
Price: 1,059,000 ₺
Fuel type: Gasoline
Year: 2018
Number of keys: 2
Mileage: 25,857 km
Body type: Hatchback
Engine size: 1.4 L
Color: Light grey
Seating capacity: 5
Paint type: Metallic
Transmission: Automatic
Wheel drive: Front wheel drive
Replaced parts: 0
Repainted parts: 0
Exterior equipment: Panoramic sunroof, Front parking sensor, Rear parking sensor, Interior equipment, Electronic air conditioner, Armrest (front), Armrest (back), Start/stop, Entertainment, Bluetooth

Due to token limitations, I cannot provide information for all 250 vehicles. I can offer this information as an API, and using an API has been suggested, but I am not sure if there is a different recommendation.

The information on 250 vehicles is a dataset size already.

The dataset is in text format only. In this specific case, I recommend number itemization:

Vehicle_itemNo: V000-001;
V000-001_Brand: Volkswagen;
V000-001_Trim level: 1.4 Tsi Bmt Highline Dsg 125HP;
V000-001_Model: Golf;
V000-001_Price: 1,059,000 ₺;
V000-001_Fuel type: Gasoline;
V000-001_Year: 2018;
V000-001_Number of keys: 2;
V000-001_Mileage: 25,857 km;
V000-001_Body type: Hatchback;
...

Keep the dataset structure (order) as much as possible, even with the numbered itemization.
It’s also recommended to insert a blank line between records (vehicles);

In the future, if necessary, append new fields at the end:

...
V000-001_License plate: XXXXXX.
...

It is recommended punctuation in fields and records:
One-line (horizontal) record: vehicle: data1, data2, data3, ..., data_n;
Multi-line (vertical) record:
vehicle:
data1;
data2;
data3;
...;
data_n. # <== “.” in the last field;

The dataset shall be uploaded to a storage service of your choice - such as Amazon S3, Google Cloud Storage, or Microsoft Azure Storage. Check the availability of the dataset for the model.

The dataset header is typically used to provide metadata about the dataset, such as the author, license, and description. It can be in free format, JSON format, or whatever you like - since it is clear for the model. Like this (free format):

{dataset header begin:
Author: asdfg hjjkl
...
Instructions:
1. ...
2. ...
...
dataset header: end.}

{dataset begin:
Vehicle_itemNo: V000-001;
V000-001_Brand: Volkswagen;
...
dataset end.}

You can include any additional information you want in the header, including instructions for the model.
The instructions should be clearly labeled and formatted so that they are easily recognizable as instructions.

You may train the model, by providing a structured record template in the dataset header.

Then you can provide the model with the dataset URL through the System role or User prompt. For example, you can provide a prompt such as Please use the dataset located at [insert URL here] to train the model.

Or in Python, for example:

import openai
# Set up your OpenAI API key
openai.api_key = "YOUR_API_KEY"

# Set the URL of your dataset
dataset_url = "https://storage.googleapis.com/my_dataset.jsonl"

# Set the prompt to use with the model
prompt = "Generate some text using my dataset: " + dataset_url
...

I hope this helps.

1 Like

You have explained it in great detail, Alex. Thank you very much my friend, I hope it works

1 Like

Hi Cedric, I work in a large call center company. I will try this product and show it to my teammates. I hope I get good performance.

What are the types of question you ask against the vehicles. Have you tried storing each vehicles information as embedding in a vector database and then match it with the questions to bring the relevant vehicle information while answering.

Also, i see that the data is very structured. If you create a flat table with the information, a sqldbchain may also work fine. There is an example using semantic kernel, but the same can be done using langchain also. Below is the semantic kernel example.

3 Likes

Hi Zeki, To address token limitations for the AI chatbot handling a database of 250 vehicles, consider implementing pagination. Load and display information for a subset of vehicles at a time, dynamically loading additional details as needed. This allows the chatbot to efficiently manage a larger database within the existing constraints, providing seamless access to all 250 vehicles without overwhelming the token limit.