How I can use a less api calls for mass file uploads to OpenAI?

ddesyllas · November 13, 2024, 10:48am

I am uploading en-masse (massively) some files located into a directory to OpenAI:

from openai import OpenAI
from dotenv import load_dotenv
import os
from pathlib import Path

load_dotenv()

client = OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
)

def getFilelistFileName(directory):

    import hashlib

    h = hashlib.new('sha256')
    h.update(directory.encode())
    return "appdata/"+h.hexdigest()

def listUploadedFiles(directory):

    fileListFile=getFilelistFileName(directory)

    file_list = []

    if os.path.isfile(fileListFile):
        with open(fileListFile, 'r') as fp:
            file_list = fp.readlines()

    return file_list

def uploadFiles(directory):
    global client

    file_list=listUploadedFiles(directory)
    dirPath = Path(directory)

    uploaded_files=[]

    for file in dirPath.iterdir():

        if not file.isFile() or file in file_list:
            continue

        response = client.files.create(
            file=open(file, "rb"),
            purpose="assistants"
        )

        uploaded_files.append(response.id)

    return uploaded_files

if __name__ == "__main__":
    uploadFiles('files/social')

But in my worst case I’ll have to perform as many api calls as my files are that indicates 2 thinks:

I have to delay the calls thus making my application slower
I make too many requests in my worst case if I have 1000 files I have to perform the API call 1000 times. That means I may blow up the rate limit.

How I can massively upload files with less api calls. I already keep record of what I already uploaded and I am looking for a way to upload files en masse.

_j · November 13, 2024, 12:55pm

There is no endpoint method to send multiple files at once to files storage. You can however open multiple connections, and there is no account rate limit, just DDoS protection limit by firewall.

There also is no API method to ADD more than one file at a time to a vector store for use with file_search. You can provide a list of files when you CREATE a vector store, a new vector store ID consuming NEW database storage at a cost and re-embedding documents per that API call’s chunk specification.

ddesyllas · November 13, 2024, 12:57pm

What I was Looking is a way I could provide a file upload directly to Vector store instead of making a file first.

Topic		Replies	Views
Best way to get hundreds of files into Open AI Files section? API assistants	0	429	January 19, 2024
Is there a limitation on the number of files that I can upload with "batch" purpose API	0	393	May 9, 2024
Add files to existing vector store API vector-db , file-uploads , knowledge-files	9	5849	November 20, 2024
Any suggestion or approach i can use for vector storage data sync? API api , assistants-api , vector-store	3	148	September 3, 2024
Creating vector stores via threads vs. fileBatches API gpt-4 , api , assistants-api , vector-store	0	52	November 5, 2024

How I can use a less api calls for mass file uploads to OpenAI?

Related topics