Good afternoon from the Philippines! My name is Diego. I am an intern tasked to create a chat bot which allows users to upload excel files as training data. From there, if a user prompts a question, then the chat bot will answer with what has been uploaded in the dataset. I’ve therefore converted excel files into .jsonl to make this happen. Initially, we used fine tuning for this. After testing, the chat bot was not very accurate and was really just a hit or miss when it comes to being correct. After thorough research, I found out that using embeddings is a more appropriate way to go about this. How do I conceptualize this? Anyways, I’m just venting out my frustration as I feel like I’m not making any progress.
Hi @diegojumagdao - welcome to the Forum. No need to get frustrated - i’m sure you’ll get there.
Did you have a chance yet to take a look at the resources in the OpenAI cookbook? There’s quite bit of guidance there in relation to embeddings etc.
This one is a good example to get started with:
There are quite a few other examples available there. Specifics of the approach will depend on whether you are looking to leverage a vector database and if so, which one.
Thanks! I’ll check it out and I’ll get back to you!
The Forum is a great resource. The more specific you can be in your question(s), the better advice you are likely to get.
I just tested my own different wikipedia articles and I’m pretty satisfied with the results. My question is if embeddings can also read xls files instead of only csv files? If not, I’m thinking of just creating an xls to csv converter which can be read using embeddings. Next step now is to use a vector DB like Pinecone? We primarily use microsoft sql server management but I can definitely introduce it in the workplace.
Welcome to the dev forum @diegojumagdao
Here’s breakout session from DevDay 2023 that should help provide more clarity:
Happy birthday @sps !
Consider using the Python pandas libraries to read data into dataframes and then derive string inputs that can feed the OpenAI ChatGPT API 4096 character context input.
Example:
import pandas as pd
df=pd.read_csv("CSV Input File.csv")
# read in CSV file data
#OR
#df=pd.read_excel("Excel Input File.xlsx")
Thank you! It seems like RAG with embeddings is the way to go.
I’ve been trying to follow along the documentation in the api and it lead me me to here.
Get embeddings from dataset | OpenAI Cookbook
However, it seems that embeddings_utils.py was dropped according to here:
v1.0 drops embeddings_util.py breaking semantic text search · Issue #676 · openai/openai-python · GitHub.
This essentially breaks the semantic search needed for this importation.
from utils.embeddings_utils import get_embedding
which is used in the get embeddings from dataset cookbook guide.
I’m a little bit lost since I’m trying it out for myself and I can’t seem to proceed. Are there any alternatives?
Yes the embedding_utils has been removed from the latest version.
You can download the embedding_utils.py file from cookbook directly and import all the utils the code uses from there.
As of today, OpenAI hasn’t been able to answer accurately from excel files. We have tested it for multiple customers at Kommunicate. We have come up with a solution using Pandas AI library.
There are multiple ways to solve it:
- Convert the user input to a filter criteria and then run it against the excel sheet
- Directly use Pandas AI library which internally uses some prompts to generate python code to get the answers. Based on our research its almost near to accurate answers.
Flow for #1 Approach
Alternatively you can use our product Kommunicate to achieve the same
If so, then does that mean that the documentation is outdated? I’ll give your suggestion a try! Right now, my chatbot is working in my terminal after trying out @jr.2509 given suggestion on the openai cookbook.
It’s not out of date. The utils are just migrated, as @logankilpatrick points out in resolution of the issue:
Hey folks, we migrated these over to the cookbook’s own utils folder ~3 months ago: openai-cookbook/examples/utils/embeddings_utils.py at main · openai/openai-cookbook · GitHub, if you find any notebooks that are out of sync and not using the built-in utils, please open an issue on the cookbook repo.
Interesting, I have built something like this for a friend’s business allowing them to give a chatbot to thier customers for product enquiries.