Help Needed to Optimize Costs and Tool Usage in API with FastAPI and Code Interpreter

iimg · September 2, 2024, 6:16am

Hello everyone,

I’m developing an API using FastAPI and OpenAI’s Code Interpreter to design a chatbot that can provide personalized responses and perform data analysis. However, I’ve noticed that my costs are increasing, and I’d like to get some clarity on where OpenAI resources are being utilized and how to optimize their usage.

Problem Summary:

I’m using a function to upload documents, analyze them, and generate personalized responses.

Details:

Functionality: The user uploads a document (or selects a previously uploaded one) and sends a message for analysis. I’m using OpenAI’s Code Interpreter to analyze the file’s content and generate personalized responses.
Problem: My OpenAI costs are rising significantly, and I’m not sure where the resources are being consumed (tokens, processing, etc.). I’ve been running tests, but my TPM (Tokens Per Minute) metrics for both input and output are high.
Objective: I would like to understand:

Why am I seeing such high costs, even though I’m only running tests?
How can I optimize the usage of the Code Interpreter and file handling in my API to reduce costs?
What is the correct way to use tools like files.create and threads.runs.create in this context to avoid excessive token usage?
The main flow involves uploading files to OpenAI and executing runs to process the file with Code Interpreter. After obtaining the result, I delete the file from OpenAI. Here is the code I’m working with:

My Code:

@app.post(
    "/analyze_document/",
    tags=["Code Interpreter"],
    summary="Upload and analyze a document",
    response_model=AssistantResponse,
    response_description="The result of the document analysis."
)
async def analyze_document(
    thread_id: str = Form(...),
    assistant_id: str = Form(...),
    file: UploadFile = File(None),  # Ahora es opcional
    file_id: str = Form(None),  # Agregamos el parámetro file_id
    message: str = Form(...)
):
    temp_file_path = None

    # Verifica si se proporcionó un archivo o un file_id
    if file:
        # Crear un archivo temporal desde el archivo subido
        with tempfile.NamedTemporaryFile(delete=False, suffix=file.filename) as temp_file:
            temp_file.write(await file.read())
            temp_file_path = temp_file.name
    elif file_id:
        # Recuperar el archivo desde GridFS
        file_id = ObjectId(file_id)
        grid_out = fs.get(file_id)

        # Crear un archivo temporal desde GridFS
        with tempfile.NamedTemporaryFile(delete=False, suffix=grid_out.filename) as temp_file:
            temp_file.write(grid_out.read())
            temp_file_path = temp_file.name
    else:
        raise HTTPException(status_code=400, detail="Either file or file_id must be provided.")

    # Subir el archivo a OpenAI desde el archivo físico
    with open(temp_file_path, "rb") as file_obj:
        file_response = cliente.files.create(
            file=file_obj,
            purpose='assistants'
        )
        openai_file_id = file_response.id

    cliente.beta.threads.update(
    thread_id=thread_id,
    tool_resources={"code_interpreter": {"file_ids": [openai_file_id]}}
    )

    message = cliente.beta.threads.messages.create(
        thread_id = thread_id,
        role = 'user',
        content=message
    )
    run = cliente.beta.threads.runs.create(
        thread_id = thread_id,
        assistant_id = assistant_id,
    )

    # Esperar a que el run esté completo y obtener mensajes
    text_content = ""
    while True:
        run_status = cliente.beta.threads.runs.retrieve(
            thread_id=thread_id,
            run_id=run.id
        )
        
        if run_status.status == 'completed':
            messages = cliente.beta.threads.messages.list(
                thread_id=thread_id
            )
            
            # Procesar solo los mensajes asociados con el run_id
            text_content = process_messages(messages, run.id)
            
            break
        else:
            time.sleep(2)

    cliente.files.delete(openai_file_id)
    
    return AssistantResponse(message=text_content)

My Specific Questions:

Is the process of uploading and then deleting files in each request efficient in terms of costs? Should I consider reusing files in some way?
Does the Code Interpreter tokenize the entire file, or just the portion that is actually processed?
Is there any way to optimize this flow so that I don’t consume so much during my tests?
How can I correctly use OpenAI’s tools to keep costs under control while designing my chatbot?

I would greatly appreciate any guidance or advice from the community.

Thanks!

Dave.girard · September 2, 2024, 3:11pm

for what i’ve seen uploading and deleting the file is not so expensive. i process many file each day (pdf) and use them only for one time file and this is what i do. i was surprise by the low cost or running this everyday using gpt-4o.

i’ve checked a while ago the fee’s to store file in the datastore and it’s seem to be far from cheap in the long run so deleting the file right after use is from my point of view the most economical.

if you could instead use function calling with the right parameter to do the processing on your end it would be way more cost effective, but i’m not aware of what you need to do with the file content. it can be sometimes frustrating to have the function calling work correctly but when it work you can add many function and the model is very good at choosing the right function with the right parameters. btw ChatGPT is very good for function definition that “speak” clearly for the open ai API. i have 25+ different function.

iimg · September 2, 2024, 6:59pm

I’m facing an issue with my current implementation and would really appreciate your help. Every time I make a query through my code, I receive a relatively high charge from the OpenAI API. I need to better understand how sessions work in this context.

My main question is: once I activate a session with OpenAI API, can I continue making queries within the same session and perform data analysis without incurring additional high charges? Specifically, I’d like to know if it’s possible to keep subsequent queries within the same session at the same cost (e.g., $0.03) instead of incurring an additional cost for each new interaction.

Additionally, I’d like to know how I can have a prolonged conversation with the assistant. Once I upload a dataset, is it possible to continue making queries about it without incurring extra charges for each query? I’m looking for a way to maintain an ongoing conversation with the assistant after uploading data, optimizing the cost.

Thank you very much for your time and assistance!

Topic		Replies	Views
Need Help Understanding ASSISTANT API Pricing for GPT-4 Turbo and File Storage API	10	9055	March 9, 2024
Seeking Advice on Reducing Costs for RAG Chatbot Using File Search Assistant API api	4	488	July 6, 2024
Token Optimization for Assistants API - Excesive token count API gpt-4 , assistants , assistants-api	2	2291	May 24, 2024
Intermittent response from Code Interpreter Feedback gpt-4 , code-interpreter , assistants-api	0	264	May 24, 2024
Optimizing Costs and Context Billing in the OpenAI Assistant API API api	0	383	March 25, 2024

Help Needed to Optimize Costs and Tool Usage in API with FastAPI and Code Interpreter

Problem Summary:

Details:

My Code:

My Specific Questions:

Related Topics