Seeking Advice on Building a RAG Chatbot

Hey everyone,

I’m a math major at the University of Chicago, and I’m interested in helping my school with academic scheduling. I want to build a Retrieval-Augmented Generation (RAG) chatbot that can assist students in planning their academic schedules. The chatbot should be able to understand course prerequisites, course times, and the terms in which courses are offered. For example, it should provide detailed advice on the courses listed in our mathematics department catalog: University of Chicago Mathematics Courses.

This project boils down to building a reliable RAG chatbot. I’m wondering if anyone knows any RAG techniques or services that could help me achieve this outcome—specifically, creating a chatbot that can inform users about course prerequisites, schedules, and possibly the requirements for the bachelor’s track.

Could the solution involve structuring the data in a specific way? For instance, scraping the website and creating a separate file containing an array of courses with their prerequisites, schedules, and quarters offered.

Overall, I’m very keen on building this chatbot because I believe it would be valuable for me and my peers. I would appreciate any advice or suggestions on what I should do or what services I could use.

Thank you!

Hey Guido,

When people speak of RAG they usually intend to speak about things like vector databases and retrieval through embeddings.

In your case I am wondering if it would make more sense to create an API with this information? Based on your description it should only require a small amount of keys that would fit nicely into

https://platform.openai.com/docs/guides/function-calling

The nice thing about an API is that you can have a malleable database which holds this information & can be easily updated with new information.

Usually embeddings are more focused towards unstructured data. In any case the recommended format is Markdown. It’s simple, explicit, and well-trained.

If you’re still intending on transforming documents into embeddings and retrieving them that why you can always start off with a Custom GPT that has Retrieval, upload some documents and play around. Then once the idea is there you can move over to Assistants. They are conceptually the same as Custom GPTs.

Lastly, you can feed all of your documents to ChatGPT (copy+paste) and ask it to produce questions of varying difficulties as a testing set to see how well the current system is running.

2 Likes