Is RAG + Fine Tuning only available with the Assistant API or is that available via ChatGPT 3.5 Completions?

I want to create an AI assistant for my website.

I want the GPT to only answer questions about my website.

I want to upload the content of the website (scrape all the URLs and upload html in a .txt file) and save it and use RAG. I also want to use fine-tuning with the required .json file and a QA format as defined in OpenAI docs.

Is it possible to use RAG with a .doc file (a non JSON file) AND fine-tuning with a ChatGPT3.5 OR ChatGPT4 and completions API?

Or, is the only real option to use the beta Assistant API? This is very unclear. It is appearing like I must use the Assistant API if I want to upload files that are in .txt or .doc or any format other than .jsonb which is hard to scrape web pages and turn into .jsonb QA fine-tuning format so I want to use RAG.

1 Like

Hi! Welcome to the forums!

Retrieval Augmented Generation.

Well, it’s not really about the document type - technically you could do it with images, audio, or even smells. For the retrieval, you have a database that is queried on demand, to dynamically inject information into your prompt. Assistants does a rudimentary version of that for you, but it’s probably best served with a text file in markdown format.

Assistants probably chunks the document for you and then embeds it, so it’s likely the easiest to get started with (apart from custom gpts)

Fine Tuning

Fine tuning is a training process. You can’t really upload a word document to the finetuning endpoint - what would that accomplish?

Fine tuning expects a jsonl of example conversations, to tell the AI how to behave. You give a bunch of examples to allow the AI to learn a style, of sorts. Maybe you want it to be more sarcastic or snarky. Maybe you want answers to be more terse and to the point.

It’s not that great at retaining information, however.


is that enough information to get you started?

Thank you for your response. I think I get the general idea but I do not understand if I can use ChatGPT3.5 or if I must use the new beta Assistant API.

Yes, I get that I am saving a document in a database. OpenAI is going to charge me to maintain that file or multiple files. But I am starting with scraping of a website so it is going to be HTML with likely CSS and tags etc so I need to scrape all the pages of a website and save it.

Can I do that with ChatGPT3.5 and the Chat Completion endpoint or must I use the new beta Assistant API for this task? This is for RAG.

I watched the entire OpenAI YouTube video and it suggests testing with RAG + fine tuning with ChatGPT3.5, but I do not see how I can do that since ChatGPT3.5’s playground only allows JSONL and I have web pages. It seems like one has to use the Assistant API, but it is beta so I would strongly prefer to use ChatGPT3.5 since I have already written the code to pass history in the correct format etc.

But I cannot figure out how to do so and I am starting to think I must use the Assistant API and ONLY the Assistant API to do the recommended RAG strategy with Fine-tuning also applied. This is the only way I can accomplish the goal. I think neither Fine-tuning nor RAG alone will get the responses I want - I will need to use both.

So Assistant API is my only possibility? Or is there a way I am not seeing to use Chat Completion API with ChatGPT3.5 (some version) or ChatGPT54 (*some version).

Well, ChatGPT is actually this: chat.openai.com

If you want to make a custom gpt, that would probably work too, to an extent.

Personally, and this is my opinion, fine tuning is a waste of money for 99% of all use cases.

Whether you choose assistants or not depends on whether you want to manage your own vector db or not, and if you want your threads managed for you as well.

Sorry, your responses are not helpful. Yes, I know what a custom GPT is and I do NOT want that. I’m building a web AI assistant using a web framework and the OpenAI gem.

Please, my question is about whether one is (A) forced to use the beta Assistant AI to store website page HTML in some format and retrieve with RAG when one also wants to reserve the option to fine-tune with JSONL files in QA format or (B) if one can use ChatGPT3.5 or ChatGPT4 with the Chat Completion API.

Please respond with just one char, A or B. Then explain why in a paragraph. lol

Sorry boss, I’m not gonna do that.

Good luck with your app!

For anyone following this post, I still find it very confusing but this seems to be one path:

https://platform.openai.com/docs/tutorials/web-qa-embeddings

So A, you can use ChatGPT3.5 with embeddings rather than file uploads and the beta Assistant API.

And there is a good Ruby version here since they only give curl and Python on the OpenAI website:

Based on this excellent and very clear article:

1 Like

Thought I’d add my two cents here.

I have created it for my website and I just used the Assistant API, adding all of my websites content as Txt files.

FYI - I just have one text file with all the content as each file can be as big as 512 MB. lol.

You do not need to fine tune it as you will then have to generate a lot of synthetic data.

Hope this helps.

Do you clean up the HTML or just save it as-is in a .txt? Do you remove tags or new lines or spaces etc?
And having it all in one file - you are getting good responses - with no fine tuning? And with which model?

I don’t clean the file. Effectively, it’s all vectorized and spaces new lines etc do not matter.

I use GPT4 for my assistant. I have tested it with 3.5 as well, but I like the responses from GPT4 better. It’s a choice and I don’t mind a few extra dollars each month.

And yes. Good responses for most questions and it doesn’t hallucinate as much.

There is always the caveat that users can try to break it. But as long as you are just using it for your website data you are fine.

Have you tried this method yet? Is its performance better than GPT assistant?

1 Like

Not yet. I will be working on this over the next month or so. Check back. I’m going to try embeddings first following that Rails tutorial and play with the end result chat. Then maybe I’ll try AI Assistants if it sucks. I am waiting for OpenAI to figure out AI Assistants and be really clear on:
1.) Costs
2.) Benefits over embeddings

Like why did they create AI assistants if they have embeddings and they work? What doesn’t work is better in AI Assistants. I am still not clear.

1 Like

A. One is not. It is the model which is fine-tuned, and the API can select whichever model you want.
B. One can use either or both with the Chat Completion API.

I know, it’s kind of overwhelmingly confusing. I was where you are a year ago. Well, I think you’re way ahead of where I was. I hope these might help:

And, if you do decide to put together your own Chat Completion RAG

GPT4 Tutorial: How to chat with multiple pdf files - The Chat Completion Process (R.A.G. / Embeddings)

Theoretically, you should be able to do the same with Assistants API, but I’ve never used that so someone familiar with that architecture can best answer that question.

P.S. Don’t get too frustrated with the folks here. I know sometimes it seems like you’re speaking English and they are replying in Russian. Most people here are really trying to help you. Like in most things, we all tend to forget how clueless we were in the very beginning.

1 Like

I think this Tech talks blog helps with title : How to fine-tune LLMs for better RAG performance

I uploaded my data to the OpenAI Assistants playground and asked it a question and it said Retrieving … and then found an answer very custom to our organization.

Also, it kind of sounded a little bit more like me after I uploaded my data. I am curious if there is definitive evidence whether the assistants are doing fine tuning and RAG under the hood which may reduce the need for one to implement either for most use cases.

Is it able to detect which uploads are relevant chat transcripts for fine tuning, but use all the data files uploaded and embed it for RAG?