Context Limit Token Issue in openai.ChatCompletion.create API Call

Hi,

I am creating a python program which creates documentation for a given program based on source code. The source code of entire program takes around 100,00 tokens. I have given prompt instructions on how to create documentation and what needs to be included. This prompt takes around 2,000 tokens.

I call this API openai.ChatCompletion.create to generate the documentation, but I get the context limit exceeds error. It says your tokens are more than 16,385 token.

This is my API Call:

Function to generate documentation for each section

def generate_documentation(prompt, summarized_code):
    try:
        full_prompt = f"{prompt}\n\n Source Code:\n{source_code}"
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": full_prompt}
            ],
            max_tokens=3000,
            n=1,
            stop=None,
            temperature=0.7,
        )

I have already tried different options like summarizing, splitting the source code but it does not work. Splitting the code does not generate the correct documentation. So this is not an option.

My requirement only fulfills if the complete source code is considered in one go. In some cases, the source code might takes 200,000 tokens.

I would like to know what is the alternate way and best practice to achieve this goal where my complete source code file is used in this API and documentation could be generated in one go.

I have also tried to explore vector database, but don’t have much knowledge of it. I want to get the expert opinion on how to handle this issue. I am also fine if I need to upload the source code file somewhere.

Any help will be highly appreciated.

Thanks,
SalAi

1 Like

Welcome to the Forum!

Have you considered switching to a model with a higher context window such as gpt-4o or gpt-4-turbo? They both allow for up to 128k tokens including up to 4k output tokens.

3 Likes

In my case, I need to do only one API call and my source code contains 100K to 300K tokens. Even upgraded models will not work.

I want to know if there is any other way like storing this source code file somewhere else and pass to the API or may be in some other vector database.

How other people are handling large requests. I know they have big pdf, excel, word documents and sometimes complete websites. How they handle large requests?

Thanks,
SalAI

Any help please. Rest of my functionality is completed, only stuck with this token issue.

Can I use any other option like storing my source code on vector database and deletes after documentation is generated OR any other options?

Thanks,
SalAI

Hi,

There are currently no way of passing information via a side channel to avoid them being in the main prompt.

Current generation models are stateless and require all information for a prompt to be included with every API call, nothing is retained across calls.

the input context limit is 128k tokens and that must be respected for all calls.

2 Likes

Would storing the source code in RAG/Vector DB helps in terms of token limits?

My API calls are not back and forth like chat API. I just need to call ones to generate documentation based on my prompt. The only problem is bigger size of source code.

RAG and code… it can work sometimes, but embeddings store semantic meaning and lots of code segments can have VERY similar meanings with only a slight change that makes them do wildly different things.

You can try, but I’ve never gotten code and RAG to play nice.

1 Like

Thanks @Foxalabs. I am trying RAG option and will share the results how it goes.
Any other idea? I have seen many people already creating documentation based on source code. Wondering how they have achieved it?

1 Like

Any help please?

Still working on RAG as alternate option but getting errors.

Would like to know if there is any other way which I am missing? Would like to explore how other people handle the source code documentations.

Thanks,

1 Like

I’m tagging @stevenic as I think something he’s been working on would be helpful.

I’m pretty sure they’ve come up with a way to just feed the entire program in and eventually get an answer out…

4 Likes

We have a service we’re about to launch in preview form which essentially has an infinite input context window. For source code we’ve successfully passed in several million tokens and it’s able to reason over all the files in the program. It does this by distributing the reasoning across multiple model calls but to the user it looks like a single request.

Creating an infinite output context window is a different story. I’m working on it and starting to make progress but it’s definitely challenging.

To generate documentation for source code you’re not going to be able to do it all in one go. You’ll need to make several passes over the code. You can basically get our service to generate an outline for you in the first pass and then you can write the docs section by section in each subsequent pass but it’s a bit like hanging wallpaper. The edges of the sections won’t always line up perfectly and you can easily end up with inconsistencies across sections. you can smooth these edges out by passing the whole doc back through our service (it does have an infinite context window) and have it rewrite each section to be more consistent.

I’m working to automate all of this but I’m not quite there yet.

2 Likes

Thank you so much @stevenic for detailed answer.
How can I get the access to the service?

Yes, the plan is to go section by section to get the best results. I have already tested section by section approach on smaller programs and it’s providing best results.

Thanks
SalAI

We’re planning a waitlist. I’ll post back here once we’ve launched. We’re close

1 Like

I asked ChatGPT and its response is here:

To tackle the problem of generating documentation for a large source code file, given the token limit constraints of the GPT model, you can consider the following approaches:

1. Chunking the Code with Overlap:

  • Approach: Split the source code into smaller chunks that are within the token limit and process each chunk separately. To ensure continuity in documentation, you can introduce an overlap between chunks. For example, each chunk could include a small portion of the previous chunk to maintain context.
  • Implementation: Modify your script to split the source code into chunks and process each chunk iteratively. You can then combine the generated documentation.

python

Copy code

def chunk_code(source_code, max_tokens=3000, overlap=200):
    chunks = []
    lines = source_code.splitlines()
    current_chunk = []

    for line in lines:
        current_chunk.append(line)
        if len("".join(current_chunk)) >= max_tokens - overlap:
            chunks.append("\n".join(current_chunk))
            current_chunk = current_chunk[-overlap:]
    
    if current_chunk:
        chunks.append("\n".join(current_chunk))
    
    return chunks

def generate_full_documentation(source_code):
    chunks = chunk_code(source_code)
    documentation = ""
    for chunk in chunks:
        documentation += generate_documentation(chunk)  # Assuming your function is updated to handle chunks
    
    return documentation

2. Summarize Sections First:

  • Approach: If certain sections of the code are more verbose or repetitive, you might first generate summaries for those sections, which are then included in the final prompt to reduce the token count.
  • Implementation: Create a helper function to summarize parts of the code before generating the full documentation.

python

Copy code

def summarize_code(section):
    summary_prompt = f"Summarize the following code:\n{section}"
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": summary_prompt},
        ],
        max_tokens=3000,
        n=1,
        stop=None,
        temperature=0.7,
    )
    return response['choices'][0]['message']['content']

def generate_full_documentation(source_code):
    summarized_code = summarize_code(source_code)
    return generate_documentation(summarized_code)

3. Utilizing Vector Databases (Advanced):

  • Approach: Store the code and associated embeddings in a vector database. Query sections of the code dynamically as needed, which allows the model to fetch relevant parts of the code for generating documentation without exceeding token limits.
  • Implementation: This requires more setup and a good understanding of vector databases like Pinecone, FAISS, etc. However, it can be very effective for large codebases.

4. API Call for Each Section (Modular Documentation):

  • Approach: Divide your code into logical sections, such as classes, functions, or modules. Generate documentation for each section independently, which allows you to handle much larger codebases in a modular way.
  • Implementation: Loop through each section and generate documentation individually.

python

Copy code

def generate_documentation_for_sections(sections):
    documentation = ""
    for section in sections:
        documentation += generate_documentation(section)
    return documentation

def split_into_sections(source_code):
    # Implement a method to split your code into logical sections
    return sections

def generate_full_documentation(source_code):
    sections = split_into_sections(source_code)
    return generate_documentation_for_sections(sections)

5. Increase Model Context Window:

  • Future Consideration: If larger context windows become available in future versions of the API (such as GPT-4-32k or future models), you could use these to handle larger blocks of code in a single API call.

6. Considerations for External Storage:

  • If you are open to uploading your code file somewhere and handling it externally, you could consider integrating cloud storage (e.g., AWS S3) to store large files and only fetch relevant parts for processing. This way, you don’t need to handle the entire file in a single API call.

Conclusion:

Given your requirements, the Chunking the Code with Overlap method is likely the most straightforward approach. It allows you to handle large source files while maintaining the context necessary for accurate documentation generation.

If you have further questions or need help with specific implementation details, feel free to ask!

@stevenic - Just wanted to check if it’s launched. Thanks.

Not yet… DM me and lets chat about your scenario

When you have a large file of code exceeding the context window then documentation is not really what this software needs. It needs refactoring to get the files to max 200 lines of code and methods to max 10 lines of code.

Do that manually. Don’t be lazy. How hard can it be?

I’m assuming the OP is writing usage guides or concept docs. For that you need to see the way various components interact with each other. You may not always need to see the entire code base but it could be difficult to tease apart the individual files needed for a specific documentation task.