How to Modify a Spreadsheet Using OpenAI API and Download the Updated File via Python Script?

Hi everyone,

I’m working on a Python script that uploads a spreadsheet (.csv, .xlsx, or .xls), uses OpenAI’s code_interpreter tool to make modifications based on a prompt, and then retrieves the updated file. My goal is to automate this process so that the spreadsheet can be modified dynamically and the updated version can be saved locally.

Here’s a summary of what I’m trying to do:

  1. Upload a spreadsheet file to OpenAI (in this case, I’ve converted it to CSV for consistency).
  2. Use OpenAI’s API to create an Assistant with the code_interpreter tool and provide instructions to modify the spreadsheet (e.g., add a new column with specific values).
  3. Retrieve the modified file or content and save it locally as a new file.

The issues I’m facing:

  1. Although my script seems to run without major errors, the updated spreadsheet content is not being correctly modified or retrieved.
  2. The Assistant often returns text responses instead of actual file content.
  3. I receive messages indicating file parsing issues, such as delimiter problems in the CSV.
  4. When attempting to fetch the updated content, the file either does not get updated as expected or does not get saved locally.

Specific Questions:

  1. Am I using the correct endpoints for this task (files.create, assistants.create, etc.)?
  2. Is the code_interpreter tool the best approach to use for spreadsheet manipulation, or is there another tool within OpenAI’s API that might work better?
  3. What is the recommended way to structure prompts for spreadsheet modifications to ensure the Assistant performs the requested tasks correctly (e.g., adding a column with specific values)?
  4. How can I ensure the modified spreadsheet is properly retrieved and saved locally?

Here is the full script I’m using:

python

Copy code

import os
import time
import pandas as pd
from openai import OpenAI

# Initialize the OpenAI client with your API key
client = OpenAI(api_key='YOUR_OPENAI_API_KEY')

# Define the directory where the updated file will be saved
output_dir = "/path/to/your/output/directory"
output_file = os.path.join(output_dir, "updated_spreadsheet.csv")

# Ensure the output directory exists
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

# Step 1: Convert the Excel file to CSV
input_excel = "/path/to/your/input/spreadsheet.xlsx"
csv_file_path = os.path.join(output_dir, "test.csv")

# Convert Excel to CSV
df = pd.read_excel(input_excel)
df.to_csv(csv_file_path, index=False)

# Step 2: Upload the CSV file to OpenAI
try:
    with open(csv_file_path, "rb") as file:
        file_response = client.files.create(file=file, purpose='assistants')
    file_id = file_response.id
    print(f"First file uploaded: {file_id}")
except Exception as e:
    print(f"An error occurred while uploading the file: {e}")
    exit()

# Step 3: Create an Assistant with the code_interpreter tool
try:
    assistant = client.beta.assistants.create(
        name="Spreadsheet Assistant",
        description="An Assistant for editing and overwriting CSV files.",
        model="gpt-4",
        tools=[{"type": "code_interpreter"}],
        tool_resources={
            "code_interpreter": {
                "file_ids": [file_id]
            }
        }
    )
    assistant_id = assistant.id
    print(f"Assistant created: {assistant_id}")
except Exception as e:
    print(f"An error occurred while creating the Assistant: {e}")
    exit()

# Step 4: Start a thread with instructions to modify the spreadsheet
try:
    thread = client.beta.threads.create(
        messages=[
            {
                "role": "user",
                "content": (
                    "Please load the CSV file, add a new column called 'Test Column' with the value 'test' "
                    "for all rows, and return the updated CSV file as a download."
                ),
                "attachments": [
                    {
                        "file_id": file_id,
                        "tools": [{"type": "code_interpreter"}]
                    }
                ]
            }
        ]
    )
    thread_id = thread.id
    print(f"Thread created: {thread_id}")
except Exception as e:
    print(f"An error occurred while creating the thread: {e}")
    exit()

# Step 5: Run the thread and wait for the Assistant to complete the task
try:
    run = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=assistant_id
    )
    run_id = run.id

    # Poll for the run to complete
    run_status = run.status
    while run.status == "queued" or run.status == "in_progress":
        run = client.beta.threads.runs.retrieve(
            thread_id=thread_id,
            run_id=run_id
        )
        time.sleep(0.5)
        run_status = run.status
        print(f"Run status: {run_status}")

    if run_status == "completed":
        print("Run completed successfully!")
    else:
        print(f"Run failed with status: {run_status}")
        exit()
except Exception as e:
    print(f"An error occurred during the run: {e}")
    exit()

# Step 6: Retrieve the updated CSV content from the Assistant's response
try:
    print("Step 6: Fetching the list of messages in the thread...")

    # Retrieve the list of messages
    messages_response = client.beta.threads.messages.list(thread_id=thread_id)
    messages = messages_response.data
    print(f"Total messages retrieved: {len(messages)}")

    csv_content = None

    # Inspect the content of each message
    for idx, message in enumerate(messages):
        print(f"Message {idx + 1}/{len(messages)}: {message}")
        print(f"Message role: {message.role}, run_id: {message.run_id}")

        # Check for text content that looks like CSV content
        for content_block in message.content:
            if content_block.type == "text" and ',' in content_block.text.value:
                csv_content = content_block.text.value
                print(f"Found potential CSV content in message {idx + 1}")
                break

        if csv_content:
            break

    # If CSV content was found, save it locally
    if csv_content:
        print("Saving the updated CSV content locally...")
        with open(output_file, "w") as f:
            f.write(csv_content)
        print(f"Updated CSV saved successfully as '{output_file}'.")

        # Optionally, save as Excel
        df = pd.read_csv(output_file)
        df.to_excel(os.path.join(output_dir, "updated_spreadsheet.xlsx"), index=False)
        print(f"Updated Excel file saved as 'updated_spreadsheet.xlsx'.")

    else:
        print("No CSV content found in the Assistant's response.")

except Exception as e:
    print(f"An error occurred while retrieving the updated content: {e}")

Any suggestions or guidance would be greatly appreciated! If someone has successfully used the OpenAI API to modify a spreadsheet and retrieve the updated file via Python, I’d love to hear how you approached it. Thanks in advance!

1 Like

Any successes with this? I was getting responses some time ago saying that the spreadsheet was being updated internally… I doubted it was happening but man that would be huge. I could see some problems in creating errors though.