Has anyone successfully used OpenAI to interpret data sets?

jj1 · February 10, 2023, 9:14am

I have some large cohort analysis-style data sets (mainly for marketing/sales) as well as less complex data sets (monthly sales bar-charts, etc.) and I’m curious if anyone has succeeded in packaging these data sets into a way that OpenAI can provide conversational insight?

Would love to chat with you if you’ve succeeded in this!

Thanks!

udm17 · February 10, 2023, 11:56am

I think Langchain as a package has a function where you could provide additional data along with the prompt to a GPT call. Might be worth having a look

kevin6 · February 10, 2023, 12:27pm

Yes, I have done a similar project. It sounds like basic information QA retrieval, for testing you can follow this: Question answering using embeddings-based search | OpenAI Cookbook

For production, I suggest using a vector search engine like Weaviate.
I’m trying to open source some of my projects these days. I will update this post later.

lmccallum · February 10, 2023, 3:18pm

Hi @kevin6. Do you have any insight into whether OpenAI will increase the maximum number of tokens that can be sent in a prompt to the completions endpoint? I’ve built a Q&A system and the token limit has been one of the bigger challenges to work around. Thanks.

kevin6 · February 10, 2023, 5:50pm

Depending on your dataset, the best option is to split the articles into smaller chunks, even if they increase the max tokens, with current architecture will not be more than 8k tokens.

lmccallum · February 10, 2023, 6:29pm

Thanks. The token limit for embeddings is now 8000 so I was thinking they might do the same for completions relatively soon. Do you think its possible that Bing’s or Google’s version of chat is capable of accepting more tokens? I’m very curious about the capabilities of the API endpoints vs. the capabilities of the architecure that the big companies are using for their public-facing search&chat tools.

kevin6 · February 13, 2023, 3:34pm

Bing and Google currently use a completely different architecture for search, but one option for them is to narrow down documents using their current architecture and then answer queries using larger language models. While this is just speculation, they probably have a lot of questions about what information will appear in their search results, what will not appear, and many other questions about fair representation in their search results. Large companies will be dealing with many different issues rather than providing the right information to users’ queries; there are many documents that are private and many companies need applications for better query searching and answers to their data.

Topic		Replies	Views
App architecture --> how to send large dataser for analysis (exceeding token limit) API	8	8461	December 17, 2023
What is the best way to upload datasets that exceed the token limit? API	3	1496	December 18, 2023
Creating a conversational chat bot with a large data set API	4	3128	March 2, 2023
Training OpenAI on a private dataset API	19	52986	December 12, 2023
Is it possible to fine-tune a model to answer questions given a raw text? Prompting	18	10107	December 15, 2023

Has anyone successfully used OpenAI to interpret data sets?

Related topics