Seeking Advice on Handling Large Vehicle Database for AI Chatbot Application

zeki.unyildiz · April 27, 2023, 1:46pm

Our client requires an AI chatbot that can handle and answer questions about 250 vehicles in their inventory. However, due to token limitations, our current system can only manage information for up to 40 vehicles. Seeking suggestions on how to overcome this challenge and efficiently implement a larger database within our AI chatbot.

CreatiCode · April 27, 2023, 2:51pm

Do you mind providing more details about the use case? For example, what the vehicle data look like, and what kind of questions you plan to handle.

zeki.unyildiz · April 27, 2023, 6:19pm

I have information on 250 vehicles, similar to the example below:

Brand: Volkswagen
Trim level: 1.4 Tsi Bmt Highline Dsg 125HP
Model: Golf
Price: 1,059,000 ₺
Fuel type: Gasoline
Year: 2018
Number of keys: 2
Mileage: 25,857 km
Body type: Hatchback
Engine size: 1.4 L
Color: Light grey
Seating capacity: 5
Paint type: Metallic
Transmission: Automatic
Wheel drive: Front wheel drive
Replaced parts: 0
Repainted parts: 0
Exterior equipment: Panoramic sunroof, Front parking sensor, Rear parking sensor, Interior equipment, Electronic air conditioner, Armrest (front), Armrest (back), Start/stop, Entertainment, Bluetooth

Due to token limitations, I cannot provide information for all 250 vehicles. I can offer this information as an API, and using an API has been suggested, but I am not sure if there is a different recommendation.

AlexDeM · April 27, 2023, 7:50pm

The information on 250 vehicles is a dataset size already.

The dataset is in text format only. In this specific case, I recommend number itemization:

Vehicle_itemNo: V000-001;
V000-001_Brand: Volkswagen;
V000-001_Trim level: 1.4 Tsi Bmt Highline Dsg 125HP;
V000-001_Model: Golf;
V000-001_Price: 1,059,000 ₺;
V000-001_Fuel type: Gasoline;
V000-001_Year: 2018;
V000-001_Number of keys: 2;
V000-001_Mileage: 25,857 km;
V000-001_Body type: Hatchback;
...

Keep the dataset structure (order) as much as possible, even with the numbered itemization.
It’s also recommended to insert a blank line between records (vehicles);

In the future, if necessary, append new fields at the end:

...
V000-001_License plate: XXXXXX.
...

It is recommended punctuation in fields and records:
One-line (horizontal) record: vehicle: data1, data2, data3, ..., data_n;
Multi-line (vertical) record:
vehicle:
data1;
data2;
data3;
...;
data_n. # <== “.” in the last field;

The dataset shall be uploaded to a storage service of your choice - such as Amazon S3, Google Cloud Storage, or Microsoft Azure Storage. Check the availability of the dataset for the model.

The dataset header is typically used to provide metadata about the dataset, such as the author, license, and description. It can be in free format, JSON format, or whatever you like - since it is clear for the model. Like this (free format):

{dataset header begin:
Author: asdfg hjjkl
...
Instructions:
1. ...
2. ...
...
dataset header: end.}

{dataset begin:
Vehicle_itemNo: V000-001;
V000-001_Brand: Volkswagen;
...
dataset end.}

You can include any additional information you want in the header, including instructions for the model.
The instructions should be clearly labeled and formatted so that they are easily recognizable as instructions.

You may train the model, by providing a structured record template in the dataset header.

Then you can provide the model with the dataset URL through the System role or User prompt. For example, you can provide a prompt such as Please use the dataset located at [insert URL here] to train the model.

Or in Python, for example:

import openai
# Set up your OpenAI API key
openai.api_key = "YOUR_API_KEY"

# Set the URL of your dataset
dataset_url = "https://storage.googleapis.com/my_dataset.jsonl"

# Set the prompt to use with the model
prompt = "Generate some text using my dataset: " + dataset_url
...

I hope this helps.

zeki.unyildiz · April 28, 2023, 11:59am

You have explained it in great detail, Alex. Thank you very much my friend, I hope it works

zeki.unyildiz · September 13, 2023, 12:04pm

Hi Cedric, I work in a large call center company. I will try this product and show it to my teammates. I hope I get good performance.

joyasree78 · September 17, 2023, 7:03pm

What are the types of question you ask against the vehicles. Have you tried storing each vehicles information as embedding in a vector database and then match it with the questions to bring the relevant vehicle information while answering.

Also, i see that the data is very structured. If you create a flat table with the information, a sqldbchain may also work fine. There is an example using semantic kernel, but the same can be done using langchain also. Below is the semantic kernel example.

soffosdotai · December 12, 2023, 7:30pm

Hi Zeki, To address token limitations for the AI chatbot handling a database of 250 vehicles, consider implementing pagination. Load and display information for a subset of vehicles at a time, dynamically loading additional details as needed. This allows the chatbot to efficiently manage a larger database within the existing constraints, providing seamless access to all 250 vehicles without overwhelming the token limit.

s.lyapustin · December 5, 2024, 5:41pm

You need to look in to the RAG direction. Search for some videos on RAG Llama index in Youtube to get an idea.

Topic		Replies	Views
Leveraging LLMs with Vast Mechanic Datasets and Guides API api	6	2154	August 31, 2023
About the usage of ChatGPT Embedding API	9	4368	August 18, 2023
Seeking guidance on managing long conversations and token limits while implementing ChatGPT in a mobile app for a design application API	6	2306	November 15, 2023
Big vector DB making prompting impossible API assistants-api	5	158	November 16, 2024
RAG or Fine tuning for a domain specific QA chatbot API rag , development , chatbot , assistants-api	4	1334	July 3, 2024

Seeking Advice on Handling Large Vehicle Database for AI Chatbot Application

Related topics