Consequences of Assistants API for LangChain and vector database

I’m building a domain specific chatbot assistant with some hundreds of documents.
I understand I can supply those documents to the assistant, so embeddings are no longer needed.
The bot needs to be fine-tuned to respond in a defined style.
My question is, do I still need LangChain with a vector database like weaviate, or can I leave them out?

Actual fine-tune models cannot be used with assistants.

Embeddings is still used on “retrieval” “assistant files” uploaded.

You cannot “attach” more than 20 files per assistant, but they can be quite large.

You are billed per day for storage…times the number of attached assistants.

You have no control over the amount of chat context, metadata enhancement, or prompt enhancement techniques used for the semantic search retrieval.

The model will be filled to the max context length with retrieval regardless of quality.

In short: there are only about 100 different reasons why you would NOT want to use assistants.

1 Like

Got it. So the assistants API is not an adequate replacement for LangChain and local vector database. That’s a pity as it would have simplified the technical implementation a lot.

Hi aringa, you could potentially use this python code to get a max of 20 chunks files of all your files:

import os
import sys
import tiktoken

def create_chunks(directory_path, model_name="gpt-4", max_tokens_per_chunk=1500000, max_chunks=20):
    # Get the tokenizer encoding for the specified model
    encoding = tiktoken.encoding_for_model(model_name)

    # Initialize variables
    chunks = []
    current_chunk = ""
    current_token_count = 0
    chunk_count = 0

    # Iterate over each file in the directory
    for file_name in os.listdir(directory_path):
        file_path = os.path.join(directory_path, file_name)

        # Ensure that it's a file
        if os.path.isfile(file_path):
            # Read the file
            with open(file_path, 'r') as file:
                text =

            # Divide the text into chunks based on tokens
            for line in text.split('\n'):
                line_tokens = encoding.encode(line)
                line_token_count = len(line_tokens)

                if current_token_count + line_token_count > max_tokens_per_chunk:
                    current_chunk = line + '\n'
                    current_token_count = line_token_count
                    chunk_count += 1

                    if chunk_count == max_chunks:
                    current_chunk += line + '\n'
                    current_token_count += line_token_count

            if chunk_count == max_chunks:

    if current_chunk and chunk_count < max_chunks:

    # Save each chunk to a separate text file
    for i, chunk in enumerate(chunks):
        chunk_file_path = f'chunk{i+1}.txt'
        with open(chunk_file_path, 'w') as chunk_file:

# Example usage
create_chunks('/path/to/directory', model_name="gpt-4")

based on this tutorial: Tutorial

This is interesting , I don’t know that. Do you have a reference by any chance?

It’s the old “I came here with questions so I can doubt the answers”.

You mean like an “API reference”? Yeah, I’ve got one. There’s a link on the side of the forum.

When there you can either read the technical stuff, or go to the top bar and choose documentation and go to assistants and threads there also.

You’ll find that both threads and retrieval promise to fill the AI context to the max. So especially bad for you if you let random people at a chatbot with no “hangup” function and you’ve got lots of documents about your knowledge to share with potential or existing customers when they have a long chat.

The engagement can cost you more than a human.