OpenAI custom API with own data

szostakpiotr88 · September 28, 2024, 5:00pm

Is there a way to put my own data (f.e. large text/pdf/whatever) and then create custom API specificly for this data?

My goal is to make an API based on my companys own data

MrFriday · September 28, 2024, 5:50pm

Do you mean save your companies data on your companies servers and not OpenAI Storage? And have only authenticated users use those APIs?

oliveearthdigital · September 28, 2024, 11:29pm

I believe there are two ways you can explore - a) embeddings API which uses RAG approach or b) Fine tuning with Assistant API

curt.kennedy · September 29, 2024, 5:21am

If you want to draw responses from your companies data, you will likely end up with some form of RAG.

The data is chunked, and each chunk is embedded. The incoming question, with whatever additional context, or additional expansions (HyDE) then correlates to this data.

This is then put into the prompt for the LLM to draw its response from. You may have to force it to stay within the confines of the data (also with prompting). Also, check your correlation values, and make sure they are high enough to be truly relevant before you inject them into the prompt.

Fine tuning would be used to give the model a certain “tone” but I would stay away from this until you get the RAG figured out.

szostakpiotr88 · September 29, 2024, 12:02pm

OpenAI Storage? If it’s not public that’s no problem. But if it’s visible for others -then it shouuld be on my servers

szostakpiotr88 · September 29, 2024, 12:03pm

Sorry but I dont underrstand it - I have my data in f.e. text document, but it’s too large to put it in regular prompt. I would like to tell the script sth like “Check this text document [my data] and give me answear based on it”

jr.2509 · September 29, 2024, 12:24pm

Hi @szostakpiotr88!

I suggest you have a look at this guide. It will explain to you why RAG is indeed the right solution for your problem and why fine-tuning isn’t.

https://platform.openai.com/docs/guides/optimizing-llm-accuracy

You have a couple of different options. This involves either creating your own RAG solution OR taking advantage of OpenAI’s Assistants, which allow you to upload files to a so-called vector store and then use queries to get answers on the basis of the uploaded files. You can read up more here on the specifics of the process. You can test out Assistants in the Playground with very limited initial effort.

szostakpiotr88 · September 29, 2024, 4:02pm

It’s really complex - I don’t know why OpenAI can’t just give the option to send a file as data on which to base a request.
What you’ve shown me looks heavily complicated and totally all around…
Thanks anyway

MrFriday · September 29, 2024, 4:27pm

It’s not complex. Like @jr.2509 said, OpenAI Assistant can help you. If you want to, I can hop on call with you to explain you things.

jr.2509 · September 29, 2024, 4:27pm

You can also just build a custom GPT with no code at all. The process would take 5-10 minutes. But of course that comes with limitations.

But even just creating a first prototype with the Assistant using the Playground takes not significantly more time and requires no coding. Of course, when you get to the point where you want to deploy the Assistant, then it gets a bit more complex.

szostakpiotr88 · September 29, 2024, 5:33pm

Hold on a second - lets assume that I use PHP - I can create curl connection with OpenAI thats no problem - but how do I then use any of what U mentioned?
I want to strictly use only PHP + the data that I have (let’s say it’s in some data.txt document)

szostakpiotr88 · September 29, 2024, 5:35pm

That’d be nice, but maybe first try to make it clear for me how should I use php + data in file to achieve it?

Please explain like i’m five and I only know PHP + have my company’s data on some data.txt file

razvan.i.savin · September 29, 2024, 6:35pm

c) You do not use embeddings; instead, you send text through the API and allow the AI to access real-time data using the RAG pipeline.

szostakpiotr88 · October 1, 2024, 6:13pm

“you send text through the API and allow the AI to access real-time data using the RAG pipeline”

how can U achieve it? Is it a method to put more data in prompt?

Topic		Replies	Views
Question - Chatbot using your own data? Community gpt-4 , chatgpt	16	10349	August 13, 2024
Upload file and get answers? API	10	17170	December 15, 2023
How to build custom Q&A app? API	7	2497	February 16, 2023
Implementing a file upload in my application using open ai api API gpt-4 , chatgpt , plugin-development , api , chatgpt-plugin	7	6341	January 25, 2024
How to Add Knowledge Base in API API api	12	15328	December 15, 2023

OpenAI custom API with own data

Related topics