How do download files generated in AI Assistants?

How do I go about downloading files generated in open AI assistant?
I have file annotations like this

 TextAnnotationFilePath(end_index=466, file_path=TextAnnotationFilePathFilePath(file_id='file-7FiD35cCwF6hv2eB7QOMT2rs'), start_index=427, text='sandbox:/mnt/data/stat_sig_pie_plot.png', type='file_path')]

I am not able to read the file object using .read()

2 Likes

Iā€™m having the same problem. I can retrieve the ā€˜fileā€™ from the API but I havenā€™t been able to figure out how to decode it. I am getting a PNG image created but canā€™t seem to get it to an openable format.

Itā€™s crazy how hard this was to figure out - had to go digging through the SDK to put the pieces together.

This works for me:

import OpenAI from 'openai';
import fs from 'fs';

(async function run() {
  const fileid = 'file-kqzPeg6MhD0HoCaDnaK3XSJN';
  console.log('Loading ', fileid);
  const openai = new OpenAI();

  const file = await openai.files.content(fileid);

  console.log(file.headers);

  const bufferView = new Uint8Array(await file.arrayBuffer());

  fs.writeFileSync('file-kqzPeg6MhD0HoCaDnaK3XSJN.png', bufferView);
})();

Replace fileid with the image_file.file_id or other ids you get and it should work. Thereā€™s no easy way to figure out the extension programmatically other than to go digging through the headers and to parse them.

If anyone has better ways please do share them!

3 Likes

I am adding some more example code in the section of the docs that goes over this: OpenAI Platform just waiting for an approval and deploy.

3 Likes

Cool thanks folks figured it out using SDK code. Forgot to update here :confused:

The text you added to the documentation says that the link is of the form:

sandbox:/mnt/data/shuffled_file.csv

But that is not a valid url at all. How we are supposed to get those files from the playground?

5 Likes

you have to use

await openai.files.content

to get the file

That is the way to download it via python programmatically, ok. It works. But what if I want to download the generated file while Iā€™m in the playground using Safari? The link to download the file gives error and says that the javascript is not valid, while the button to download the file in the files section doesnā€™t do anything. Telling the Assistant that the download link doesnā€™t work, or asking for a link to the file, even specifying the file ID, will make the assistant answer with a link to a google server!!

ā€˜ā€™ā€˜https://storage.googleapis.com/assistant-sandbox-attachments/session_WuFUqxIJG99AoTWnihXVz/file-85NrdwClpdBrNbZLywortwj9ā€™ā€˜ā€™

If I ask again, the Assistant apologizes, and then answer with this other link:

ā€˜ā€™ā€˜https://assistant-sandbox.dynalistcdn.com/sandbox/file-85NrdwClpdBrNbZLywortwj9ā€™ā€˜ā€™

Both are completely bogus linksā€¦

Conclusion: ATM there is no way to download a file generated by the AI using the browser when using the playground to test the assistants.

I am getting the same behaviour in chatGPTā€¦ canā€™t download files being presented, just links to sandbox:/mnt/data/ā€¦ or then ā€œ://bellard.org/textsynth/assets/chatgpt/sandbox?path=/mnt/data/image_recolored.pngā€ if i press for alternatives.

1 Like

It is possible via the client by using the file_id.

The steps are:

  • Get the file_id from the thread
  • Load the bytes from the file using the client
  • Save the bytes to file

If working in python:

# open_ai_client = ...
# thread = ...

def get_response(thread):
    return open_ai_client.beta.threads.messages.list(thread_id=thread.id)

def get_file_ids_from_thread(thread):
    file_ids = [
        file_id
        for m in get_response(thread)
        for file_id in m.file_ids
    ]
    return file_ids

def write_file_to_temp_dir(file_id, output_path):
    file_data = open_ai_client.files.content(file_id)
    file_data_bytes = file_data.read()
    with open(output_path, "wb") as file:
        file.write(file_data_bytes)

# So to get a file and write it
file_ids = get_file_ids_from_thread(thread)
some_file_id = file_ids[0]
write_file_to_temp_dir(some_file_id, '/tmp/some_data.txt')
4 Likes

This section in the documentation explains how you can download files generated by tools. https://platform.openai.com/docs/assistants/tools/file-citations

tl;dr: you have to look for file_ids in the content array of the message and then download the file using https://platform.openai.com/docs/api-reference/files/retrieve-contents

4 Likes

@nikunj,
Iā€™m still unclear on how to make files generated by the code interpreter downloadable.

The docs say:
ā€œWhen annotations are present in the Message object, youā€™ll see illegible model-generated substrings in the text that you should replace with the annotations.ā€

In the Message Annotations docs for file_path annotations the code shows how to reword the annotation and that we need to separately download the file.

But doesnā€™t cover how to make the file download on the client when the message is clicked.

# Iterate over the annotations and add footnotes
for index, annotation in enumerate(annotations):
    # Replace the text with a footnote
    message_content.value = message_content.value.replace(annotation.text, f' [{index}]')

    # Gather citations based on annotation attributes
    if (file_citation := getattr(annotation, 'file_citation', None)):
        cited_file = client.files.retrieve(file_citation.file_id)
        citations.append(f'[{index}] {file_citation.quote} from {cited_file.filename}')
    elif (file_path := getattr(annotation, 'file_path', None)):
        cited_file = client.files.retrieve(file_path.file_id)
        citations.append(f'[{index}] Click <here> to download {cited_file.filename}')
        # Note: File download functionality not implemented above for brevity

Can you please elaborate more on

# Note: File download functionality not implemented above for brevity

How the cited_file is meant to be attached to the citation?

Thanks in advance!

These instructions imply that I can download the file from the client directly from openAIā€™s servers.

It makes sense that the files wouldnā€™t be downloadable from a URL from a security perspective.

When receiving a file from the code interpreter in the Assistants playground I am directed to the files page and then have to click the download button to get my file.

For now, Iā€™m planning to return URL to hit a custom endpoint that will download the file.

(http://localhost:3000/api/assistant/file/{file-id})

You can use the file ID in the citation to download the file using this endpoint: https://platform.openai.com/docs/api-reference/files/retrieve-contents

1 Like

If someone wants to implement this in Django:

    from django.core.files.base import ContentFile

    def download_and_save_file(self, file_id, db_row_instance):
        """
        Download a file and store it in the DB/S3 bucket

        Args:
            file_id: The ID of the file to download.
            db_row_instance: The db_row_instance instance to attach the file to.

        Returns:
            File: File instance
        """
        file_data = self.open_ai.files.content(
            file_id=file_id,
        )
        file_data_bytes = file_data.read()
        # Create a ContentFile with the file data
        content_file = ContentFile(file_data_bytes)
        
        file_name_with_extension = f"{file_id}.png"
        # Save the ContentFile to the db_row_instance generated_file field
        db_row_instance.generated_file.save(file_name_with_extension, content_file)
1 Like

My man THANK YOU. Iā€™ve been trying to figure this out for about 8 hours. Wish I was exaggerating. I could NOT access that sandbox link to save my lifeā€¦

QUICK NOTE: Had to remove ā€˜.idā€™ from ā€˜thread_id=thread.idā€™ in get_response. Other than that, code works right out of the box.

Cheers

My adjustments in case anyone is a nerd like me:

import os
from openai import OpenAI
from dotenv import load_dotenv
from colorama import Fore

"""
Variation of Current API Calling Format as of 12/26/23
"""
load_dotenv()
try:
    client = OpenAI(
        api_key=os.environ['OPENAI_API_KEY']
        )
    if not client.api_key:
        raise ValueError("API key is missing. Check .env file.")
except KeyError as e:
    raise ValueError(f"Error Occurred: {e}\n\nCheck .env file.")
print(Fore.GREEN + f'API KEY: {client.api_key}')


thread_id = 'thread_YourThread123'
output_path = '/path/to/your/output/file'


"""
Obtain the File IDs within the Specified Thread
"""
def get_response(thread_id):
    return client.beta.threads.messages.list(thread_id=thread_id)

def get_file_ids_from_thread(thread):
    file_ids = [
        file_id
        for m in get_response(thread)
        for file_id in m.file_ids
    ]
    return file_ids


"""
Write Each File ID's Contents with Separator Implementation for Readability
"""
def write_file(file_id, count, output_path=output_path):
    file_data = client.files.content(file_id) # Extract the content from the file ID
    file_content = file_data.read() # Assign the content to a variable
    separator_start = f'\n\n\n\nFILE # {count + 1}\n\n\n\n'
    separator_end = '\n\n\n\n' + '#' * 100 + '\n\n\n\n'

    with open(output_path, "ab") as file:
        file.write(separator_start.encode())  # Encode the string to bytes
        file.write(file_content) # Write the content
        file.write(separator_end.encode())    # Encode the string to bytes


"""
Iterate through the File IDs while Calling write_file for File Output
"""
file_ids = get_file_ids_from_thread(thread_id) # Retrieve file IDs
print('\nFILE IDS: ', file_ids)
print('\nNUMBER OF FILE IDS: ', len(file_ids))
for count, file_id in enumerate(file_ids):
    print(Fore.GREEN + f'\nWriting file #{count + 1}...\n')
    write_file(file_id, count) # Write file ID contents
    print(Fore.GREEN + f'File {count + 1} written.\n')

print('Done.')