Tutorial: A Step-by-Step Guide on RAG Using SurrealDB

Do you want to build a chatbot using retrieval argument generation?

Starting a project and can’t decide between relational, object-oriented, hierarchical, network, NoSQL, column-family, document-oriented, graph, time-series, or in-memory databases? Why not throw all those choices in a blender and pour yourself a tall glass of surrealDB? If you like myself is a connoisseur of all the things, then you will appreciate surrealDB’s query language that replaces the ‘S’ in SQL with “surreal” and supports both SQL, NoSQL, and GraphQL queries.

Build a RAG chatbot using SurrealDB

This tutorial walks you through setting up a SurrealDB instance using Docker, uploading some data and querying it using function/tool calls to find and retrieve the closest embedding based on cosine similarity.

Prerequisites

5 minute quick start guide:

Those of you who want to skip directly to results can find the entire codebase with instructions on the following GitHub:

Step 1: Create Docker Compose File

Create a file named docker-compose.yml in your project directory and paste the following configuration into it:

version: '3'

services:
  surrealdb:
    image: surrealdb/surrealdb:latest
    entrypoint:
      - /surreal
      - start
      - --auth
      - --log
      - trace
      - --user
      - $DB_USER
      - --pass
      - $DB_PASSWORD
      - memory # This starts SurrealDB in memory mode. Remove "memory" for persistent storage.
    ports:
      - "8000:8000"
    env_file:
      - .env # Ensure you have a .env file in the same directory as your docker-compose.yml

This configuration defines a single service surrealdb that uses the latest SurrealDB image. It specifies startup options for authentication, logging level, and database credentials, which will be read from an environment file .env. The memory flag indicates that SurrealDB will run in-memory only, which not persistent, I suggest you remove it if you want to keep your database afterwards.

Step 2: Create Environment Variables File

Next, create a file named .env in the same directory as your docker-compose.yml. Add the following content to specify the database credentials:

DB_USER=root
DB_PASSWORD=root
OPENAI_API_KEY="Your OpenAI api key"

You can change the DB_USER and DB_PASSWORD values to suit your preferences. These credentials will be used to access your SurrealDB instance.

Step 3: Launch SurrealDB

With the docker-compose.yml and .env files in place, you’re ready to start your SurrealDB instance. Open a terminal, navigate to your project directory (where your docker-compose.yml is located), and run:

docker-compose up

This command will pull the latest SurrealDB image from Docker Hub (if it’s not already locally available), create a container based on the specifications in your docker-compose.yml, and start the database server. To verify that SurrealDB is running correctly, you can try accessing the web interface, just open your web browser and navigate to http://localhost:8000. You should be greeted by a screen telling you that the SurrealDB web console will be coming soon.

Step 4: Upload Some Data

We will use a “small dataset” like the entire works of Shakespeare (only 5.33 MB), which can be found here.

The script below starts by downloading the complete works of Shakespeare, splits this text into chunks, and then asynchronously connects to a SurrealDB database. Using OpenAI’s API, it generates embeddings for each text chunk. These chunks and their embeddings are uploaded to the database, with progress messages printed during the process. Finally, the script retrieves and prints information about the database and its root table.

import requests
import re
import os
import asyncio
from surrealdb import Surreal
from openai import OpenAI
from dotenv import load_dotenv

collection_name = "text_embeddings"
text_field_name="text"
embedding_field_name="embedding"
model="text-embedding-3-small"


def download_text(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.text
    else:
        print(f"Failed to download the text. Status code: {response.status_code}")
        return ""

def chunk_text(text):
    chunks = re.split(r'(\r?\n){3}', text)
    non_empty_chunks = [chunk.strip() for chunk in chunks if chunk.strip()]
    return non_empty_chunks

async def create_embedding(openai_client, query_string, model=model):
    response = openai_client.embeddings.create(
        input=query_string,
        model=model
    )
    query_embedding = response.data[0].embedding
    return query_embedding

async def save_text_and_embedding(db, text, embedding, collection_name=collection_name, text_field_name=text_field_name, embedding_field_name=embedding_field_name):
    data = {
        text_field_name: text,
        embedding_field_name: embedding,
    }
    await db.create(collection_name, data)

async def db_info(db):
    query = f"INFO FOR DB;"
    try:
        results = await db.query(query)
        print(results)
    except Exception as e:
        print(f"There was a problem creating the index: {e}")

    query = f"INFO FOR TABLE ROOT;"
    try:
        results = await db.query(query)
        print(results)
    except Exception as e:
        print(f"There was a problem creating the index: {e}")

async def upload_text(db, openai_client, chunks, collection_name=collection_name, text_field_name=text_field_name, embedding_field_name=embedding_field_name, model=model):
    print(f"Uploading chunks... (this may take a while)")
    for chunk in chunks:
        try:
            embedding = await create_embedding(openai_client, chunk, model)
            await save_text_and_embedding(db, chunk, embedding, collection_name, text_field_name, embedding_field_name)
            print(f"Uploaded chunk: {chunk[:42]}...")
        except Exception as e:
            print(f"Failed to upload chunk. Error: {e}")
 
async def main():
    load_dotenv()
    url = "https://raw.githubusercontent.com/borkabrak/markov/master/Complete-Works-of-William-Shakespeare.txt"

    shakespeare_text = download_text(url)

    chunks = chunk_text(shakespeare_text)
    
    async with Surreal("ws://localhost:8000/rpc") as db:
        await db.signin({
            "user": os.getenv("DB_USER", "default_username"), 
            "pass": os.getenv("DB_PASSWORD", "default_password")
        })
        await db.use("test", "test")

        openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

        upload_task = asyncio.create_task(upload_text(db, openai_client, chunks))

        await upload_task
        await db_info(db)

if __name__ == "__main__":
    asyncio.run(main())

Step 5: Make a Chatbot…

The script below sets up a simple chatbot that talks to users while using tool/function calls to retrieve knowledge from the database.

import asyncio
import json
import os
from surrealdb import Surreal
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
DB_USER = os.getenv('DB_USER', 'root')
DB_PASSWORD = os.getenv('DB_PASSWORD', 'root')
client = OpenAI(api_key=OPENAI_API_KEY)
collection_name = "text_embeddings"
text_field_name="text"
embedding_field_name="embedding"

async def create_embedding(client, query_string, model="text-embedding-3-small"):
    response = client.embeddings.create(input=query_string, model=model)
    query_embedding = response.data[0].embedding
    return query_embedding

async def search_embeddings(db, query_embedding, top_n, collection_name=collection_name, embedding_field_name=embedding_field_name , additional_fields="*", order_by="cosine_similarity DESC"):
    select_fields = f"{additional_fields}, vector::similarity::cosine({embedding_field_name}, {query_embedding}) as cosine_similarity"
    
    query = f"""
    SELECT {select_fields} 
    FROM {collection_name}
    ORDER BY {order_by}
    LIMIT {top_n};
    """
    results = await db.query(query)
    return results

async def query_database(query, top_n=1):
    async with Surreal("ws://localhost:8000/rpc") as db:
        await db.signin({"user": DB_USER, "pass": DB_PASSWORD})
        await db.use("test", "test")
        texts = []
        query_embedding = await create_embedding(client, query)
        search_results = await search_embeddings(db, query_embedding, top_n=top_n)
        for item in search_results[0]['result']:  # Adjusted to access nested 'result'
            text = item.get('text', 'N/A')  # Default to 'N/A' if not found
            cosine_similarity = item.get('cosine_similarity', 0)  # Default to 0 if not found
            texts.append({'text': text})
        return texts
    
messages=[
    {
    "role": "system",
    "content": "You have the tool `read_document`. Use `read_document` in the following circumstances:\n    -ALWAYS\n\nGiven a query that requires retrieval from the documentation, your turn will consist of two steps:\n1. Call the `read_document` command with a query string to retrieve information from the document.\n2. Write a response to the user based on the retrieved information.\n\nThe `read_document` command has the following format:\n    `read_document query: str` Retrieves information relevant to the query from the provided documentation. This tool is designed to access a broad range of information, ensuring responses are informed by the documentation’s content. \n\nYou are tasked with the role of a Shakespearean assistant, equipped with the ability to directly access and quote any part of Shakespeare's works. Your main responsibility is to always quote Shakespeare and respond in the style of Shakespeare.\n- Always be polite, professional, and respectful to users.\n- Provide accurate, clear, and concise information.\n- If you encounter a request that violates guidelines or you're unsure about, politely decline to provide the requested content or information.\n- Continuously learn from user interactions to improve your responses and the user experience."
    },
    {
    "role": "user",
    "content": "write 3 senteces about how amazing this tutorial has been, make sure its inspired by the works of shakespeare"
    }
]    
tools = [
    {
        "type": "function",
        "function": {
            "name": "read_documents",
            "description": "Retrieves documents based on a query.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Query text.",
                    },
                },
                "required": ["query"],
            },
        },
    },
]


async def run_conversation(messages,tools):
    has_tool_calls = True

    while has_tool_calls:
        response = client.chat.completions.create(
            model="gpt-4-turbo-preview",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )
        response_message = response.choices[0].message
        tool_calls = response_message.tool_calls

        if not tool_calls:
            has_tool_calls = False
            messages.append(response_message)
        else:
            available_functions = {
                "read_documents": query_database,
            }
            messages.append(response_message)

            for tool_call in tool_calls:
                function_name = tool_call.function.name
                function_to_call = available_functions.get(function_name)
                if function_to_call:
                    function_args = json.loads(tool_call.function.arguments)
                    if function_name == "read_documents":
                        function_response = await function_to_call(function_args.get("query"))
                        function_response = json.dumps({"text": function_response})
                        messages.append(
                        {
                            "tool_call_id": tool_call.id,
                            "role": "tool",
                            "name": function_name,
                            "content": function_response,
                        }
                    )

    return messages

if __name__ == "__main__":
    print(asyncio.run(run_conversation(messages,tools)))

That’s it! In this tutorial, we’ve made a simple chatbot utilizing Retrieval-Augmented Generation (RAG) techniques powered by SurrealDB. Feel free to ask if you have any questions.

7 Likes

Thanks for the writeup. Sharing information instead of complaints is nice to see.

The system message appears patterned on the language that might be seen in OpenAI context, like copying the Python tool of code interpreter.

However, that language and its phrasing is in the “# tools” section of OpenAI’s injection, not natural prompt language, marked as // comments to identify it. It also is enhanced by fine-tuning so strong that the AI will write python with no additional text at all.

For you, the descriptive text about a function should all go into a multi-line function description (escaped linefeeds). Then proper use of each parameter into their own description in the function specification. That will give higher understanding.

4 Likes

Agreed, sometimes it seems like we’re functioning as the OpenAI Support/Complaints forum instead of a Developer forum.

That or it’s a constant stream of “when will Sora be out” or whatever the new flavour of the week is.

Great post @N2U , I haven’t actually made an RAG implementation before so I’ll be using this!

1 Like

Great demonstration, clean code. Nice! SurrealDB looks pretty neat

1 Like

Thanks y’all :heart:

Yeah I was kinda curious what would happen, it seems to favor making single word queries, which I find kinda interesting :thinking:

There’s definitely room for improvement here, my main goal is just to give people something they can play with to learn the basics.

I’m happy to hear that! You will probably have to add some stuff to get it fully ready for production, and remember to add a volume to the Docker compose file and remove the - memory flag if you actually want to keep your database :sweat_smile:

You should try it, the entire source is written in Rust and it’s fast AF, can definitely recommend, the downside here is that it doesn’t have a web interface (yet), and it’s hard places that will host it.

1 Like

Now, instead of just giving the AI a search function, where you are paying twice as much because the AI has to run twice, and has no way of knowing if it will find the information it wants, you simply auto-inject relevant information to the input above a threshold.

That is the “augmented generation” part of RAG: augmenting the contextual knowledge before language generation.

1 Like

indeed,

This is also what I’m doing for production. The method on display here, though, is more akin to what you’d find on ChatGPT, which will cost twice as much as the regular method, but may be helpful for people who want to learn about tool/function calls. :laughing:

Here’s a script for those of you who just want to query the database for fun:

import os
import asyncio
from surrealdb import Surreal
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()

OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
openai_client = OpenAI(api_key=OPENAI_API_KEY)

query = "love"
collection_name = "text_embeddings"
text_field_name="text"
embedding_field_name="embedding"
model="text-embedding-3-small"

async def create_embedding(openai_client, query_string, model=model):
    response = openai_client.embeddings.create(
        input=query_string,
        model=model
    )
    query_embedding = response.data[0].embedding
    return query_embedding

async def search_embeddings(db, query_embedding, top_n, collection_name=collection_name, embedding_field_name=embedding_field_name , additional_fields="*", order_by="cosine_similarity DESC"):
    select_fields = f"{additional_fields}, vector::similarity::cosine({embedding_field_name}, {query_embedding}) as cosine_similarity"
    
    query = f"""
    SELECT {select_fields} 
    FROM {collection_name}
    ORDER BY {order_by}
    LIMIT {top_n};
    """
    results = await db.query(query)
    print(results)
    return results

async def query_database(openai_client, query, top_n=3):
    async with Surreal("ws://localhost:8000/rpc") as db:
        await db.signin({
            "user": os.getenv("DB_USER", "default_username"), 
            "pass": os.getenv("DB_PASSWORD", "default_password")
        })

        await db.use("test", "test")

        # Fetch embeddings and cosine similarity results
        query_embedding = await create_embedding(openai_client, query, model)
        vector_results = await search_embeddings(db, query_embedding, top_n=top_n * 2)
        for item in vector_results[0]['result']: 
            text = item.get('text', 'N/A')  # Default to 'N/A' if not found
            cosine_similarity = item.get('cosine_similarity', 0)  # Default to 0 if not found
            time = item.get('time', 0)
            print(f"Cosine Similarity: {cosine_similarity}, Text: \n {text}")
        time = vector_results[0].get('time', 0)
        print(f"Time: {time}")
        return text

if __name__ == "__main__":
    b_results = asyncio.run(query_database(openai_client,query))

(This isn’t on GitHub, so you’ll have to get it here if you want.)

1 Like