Has anyone successfully used OpenAI to interpret data sets?

I have some large cohort analysis-style data sets (mainly for marketing/sales) as well as less complex data sets (monthly sales bar-charts, etc.) and I’m curious if anyone has succeeded in packaging these data sets into a way that OpenAI can provide conversational insight?

Would love to chat with you if you’ve succeeded in this!


I think Langchain as a package has a function where you could provide additional data along with the prompt to a GPT call. Might be worth having a look

Yes, I have done a similar project. It sounds like basic information QA retrieval, for testing you can follow this: openai-cookbook/Question_answering_using_embeddings.ipynb at main · openai/openai-cookbook · GitHub

For production, I suggest using a vector search engine like Weaviate.
I’m trying to open source some of my projects these days. I will update this post later.

1 Like

Hi @kevin6. Do you have any insight into whether OpenAI will increase the maximum number of tokens that can be sent in a prompt to the completions endpoint? I’ve built a Q&A system and the token limit has been one of the bigger challenges to work around. Thanks.

Depending on your dataset, the best option is to split the articles into smaller chunks, even if they increase the max tokens, with current architecture will not be more than 8k tokens.

Thanks. The token limit for embeddings is now 8000 so I was thinking they might do the same for completions relatively soon. Do you think its possible that Bing’s or Google’s version of chat is capable of accepting more tokens? I’m very curious about the capabilities of the API endpoints vs. the capabilities of the architecure that the big companies are using for their public-facing search&chat tools.

Bing and Google currently use a completely different architecture for search, but one option for them is to narrow down documents using their current architecture and then answer queries using larger language models. While this is just speculation, they probably have a lot of questions about what information will appear in their search results, what will not appear, and many other questions about fair representation in their search results. Large companies will be dealing with many different issues rather than providing the right information to users’ queries; there are many documents that are private and many companies need applications for better query searching and answers to their data.