Hi,
I am creating a python program which creates documentation for a given program based on source code. The source code of entire program takes around 100,00 tokens. I have given prompt instructions on how to create documentation and what needs to be included. This prompt takes around 2,000 tokens.
I call this API openai.ChatCompletion.create to generate the documentation, but I get the context limit exceeds error. It says your tokens are more than 16,385 token.
This is my API Call:
Function to generate documentation for each section
def generate_documentation(prompt, summarized_code):
try:
full_prompt = f"{prompt}\n\n Source Code:\n{source_code}"
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": full_prompt}
],
max_tokens=3000,
n=1,
stop=None,
temperature=0.7,
)
I have already tried different options like summarizing, splitting the source code but it does not work. Splitting the code does not generate the correct documentation. So this is not an option.
My requirement only fulfills if the complete source code is considered in one go. In some cases, the source code might takes 200,000 tokens.
I would like to know what is the alternate way and best practice to achieve this goal where my complete source code file is used in this API and documentation could be generated in one go.
I have also tried to explore vector database, but don’t have much knowledge of it. I want to get the expert opinion on how to handle this issue. I am also fine if I need to upload the source code file somewhere.
Any help will be highly appreciated.
Thanks,
SalAi