Ask GPT-4o about a file - Example python function with file upload base64 and tiktoken and usage history with forced json return

Thought I post an example since it took some time to figure it out:


settings = {
    'openai_api_key': 'sk-.............'  # Add your OpenAI API key here
}

# i am giving it a connection to my database and an upload id (when i upload the file I rename it to a uuid to have a unique id to use that as a key)

def send_to_gpt4o(image_path, upload_id, conn):
    # Function to encode the image
    def encode_image(image_path):
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')

    # Encode the image to base64
    base64_image = encode_image(image_path)

    # Prepare the request headers and payload
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {settings['openai_api_key']}"
    }

    payload = {
        "model": "gpt-4o",
        "response_format": {"type": "json_object"},
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant designed to output JSON."
            },
            {
                "role": "user",
                "content": (
                    "I need you to analyse the picture and give me a json of the objects that you can recognize"
                    "Here is an example JSON: "
                    "{\"objects\":[{\"objectname\":\"ball\"},{\"objectname\":\"hedgehog ;)\"}]}"
                )
            }
        ],
        "max_tokens": 300
    }

    # Calculate token usage
    encoding = tiktoken.encoding_for_model("gpt-4o")
    tokens_used_prompt = len(encoding.encode(json.dumps(payload)))

    # Send the request to the OpenAI API
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    response_json = response.json()
    
    # Log the response for debugging
    logging.debug(f"API response: {response_json}")

    # Handle the response and calculate tokens used
    if 'choices' in response_json:
        response_content = response_json["choices"][0]["message"]["content"]
        
        # Clean the response content
        response_content = response_content.strip()
        try:
            response_content = json.loads(response_content)
        except json.JSONDecodeError as e:
            logging.error(f"JSON decode error: {e}")
            response_content = None

        if response_content:
            tokens_used_response = len(encoding.encode(json.dumps(response_content)))
            tokens_used_total = tokens_used_prompt + tokens_used_response

            # Insert the prompt and response into the database
            try:
                with conn.cursor() as cursor:
                    cursor.execute(sql.SQL("""
                        INSERT INTO prompt_history (upload_id, request_payload, response_content, tokens_used_prompt, tokens_used_response, tokens_used_total, created_at)
                        VALUES (%s, %s, %s, %s, %s, %s, %s)
                    """), (
                        upload_id,
                        json.dumps(payload),
                        json.dumps(response_content),
                        tokens_used_prompt,
                        tokens_used_response,
                        tokens_used_total,
                        datetime.now()
                    ))
                    conn.commit()
            except Exception as e:
                logging.error(f"Error saving prompt history: {e}")
                conn.rollback()

            return response_content
        else:
            logging.error("Invalid JSON response")
            return None
    else:
        error_message = response_json.get('error', 'Unknown error')
        logging.error(f"API Error: {error_message}")
        
        # Insert the prompt with error into the database
        try:
            with conn.cursor() as cursor:
                cursor.execute(sql.SQL("""
                    INSERT INTO prompt_history (upload_id, request_payload, response_content, tokens_used_prompt, tokens_used_response, tokens_used_total, created_at)
                    VALUES (%s, %s, %s, %s, %s, %s, %s)
                """), (
                    upload_id,
                    json.dumps(payload),
                    json.dumps({'error': error_message}),
                    tokens_used_prompt,
                    0,
                    tokens_used_prompt,
                    datetime.now()
                ))
                conn.commit()
        except Exception as e:
            logging.error(f"Error saving prompt history with error: {e}")
            conn.rollback()
        
        return None

Ask ChatGPT for the requirements.txt :smiley:

Here are the table definitions:

                CREATE TABLE IF NOT EXISTS uploads (
                    id UUID PRIMARY KEY,
                    original_filename TEXT,
                    status TEXT
                );

                CREATE TABLE IF NOT EXISTS prompt_history (
                    history_id SERIAL PRIMARY KEY,
                    upload_id UUID,
                    request_payload JSON,
                    response_content JSON,
                    tokens_used_prompt INT,
                    tokens_used_response INT,
                    tokens_used_total INT,
                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                    CONSTRAINT fk_upload FOREIGN KEY(upload_id) REFERENCES uploads(id)
                );

Enjoy the debugging process or just feed that code to chatgpt-4o together with your code and ask it to provide a full working example :wink:

Kind of starts feeling like this:

4 Likes

How is base64_image supposed to get into the chat completions user message to be analyzed?

You seem to have missed actually sending the picture. Also, you need to ascertain the mime type of the image file. Save some time by employing some resize logic and switching on of “detail:low”…

  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What’s in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        }
      ]
    }
  ],

The token usage of images is based on detail and tiles, but this is not calculated.

The token usage of input is not calculated based on the sent JSON, but on the language the AI receives, needing the role, name, message, tools to be constructed into the plain text with the overhead of unjoinable container tokens. Then images.

The API response has the input token counts if you are simply recording for posterity and not taking action before sending.

I am quite confused how you actually asked the AI about a picture.

2 Likes

Oh, yeah that must have happened when I edited my actual prompt in my system… didn’t want to post that.

Great finding :smiley:

1 Like

Good catch!

I tried this code but unfortunately, ChatGPT, does not like to look at images… :frowning:
It replies with: “I’m sorry for any confusion, but I’m unable to see images directly. However, if you describe the image to me, I’d be happy to help you understand or analyze it!”

Any clues? :slight_smile: