Gpt-4o Model isn't recognizing items in this image that seem obvious

I was wondering if you could let me know why it is… I’m setting detail to high and before I was using chat completions to upload it as a base 64 image but now I’m using a standard assistant function that I use in my functionality to upload it to the website and then use it from there because from what I understand then it wouldn’t be converted into base 64 which from my understanding is a black and white image and in theory I thought it should see the contents better i’ve tried it by setting the contrast of the image very high so it could see the node links but basically here’s the image and I put the full functionhere too… I apologize if that’s a big deal but

I wanted you to see what I’m doing and here’s the data said that it’s creating it seems like no matter what I do if you look at the render layers node I can’t seem to get it to ever output the proper links coming from the render layers node into other nodes this is the only node, the renders layers node that it has issues with but it keeps saying that the node links come out of the The image output of the render layers node when it’s obvious in the image that it’s coming out of the three node outputs that begin with DIFF can you give me any advice on this is there something that I’m doing…Is the API not set up to process these kind of images. If the API is not set up to process these kind of images well could you give me an idea of maybe another API like yours that might be able to process these as well that I could use in tandem for my image processing in this regard. Or if there is none would you recommend using open CV.

Image:

Function:

def generate_node_image_data(data_image):
manager = bpy.context.preferences.addons[name].preferences.Prop

global result_ready_data 
global active_thread_id_9  
global spinners  

try:
    print("")
    if manager.gpt_disable_spinners:
        print_color("AB", "EXTRACTING NODE INFO...PLEASE WAIT...")
    else:
        spinners = Halo(text="EXTRACTING NODE INFO", text_color='blue', spinner='arc', color='blue')
        spinners.start()

    api_key = get_api_key()
    client = openai.Client(api_key=api_key, timeout=50)

    user_message_content = f"\n\nData Image Included: \n\n"

    instructions = """
        Your task is to check the image and extract detailed information about Blender nodes. For each node in the image, provide the following details: 
        1. Node Name. 
        2. Settings of the node with names and values. 
        After listing all nodes and their settings, describe the connections between nodes, specifying which output of one node is connected to which input of another node. 
        Ensure the description is clear and each connection is listed on a separate line. 
        If the image you received is not of a Blender node setup e.g. geometry nodes, compositor nodes, or shader nodes or the image is unclear and you can't see settings/names/linkages for every node then output 'Could not process image.' as your only response.
    """

    # Run the necessary functions before creating the assistant
    def delete_file(file_id, api_key):
        print("\n🚀 file_id", file_id)
        url = f"https://api.openai.com/v1/files/{file_id}"
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        response = requests.delete(url, headers=headers)
        response_data = response.json()
        print("\nUpload Response:", response_data)
        return response_data

    def upload_file(file_path, api_key):
        url = "https://api.openai.com/v1/files"
        headers = {
            "Authorization": f"Bearer {api_key}"
        }
        files = {
            "file": open(file_path, "rb"),
            "purpose": (None, "assistants")
        }
        response = requests.post(url, headers=headers, files=files)
        response_json = response.json()
        file_id = response_json["id"]
        
        print("\nUpload Response:", json.dumps(response_json, indent=2))
        time.sleep(5)
        
        return file_id

    previous_file_id = manager.previous_file_id
    print_color("AW", f"\n🚀 Previous File ID: {previous_file_id}")
    
    if previous_file_id:
        delete_file(previous_file_id, api_key)
    

    new_file_id = upload_file(data_image, api_key)
    
    manager.previous_file_id = new_file_id
    print_color("AW", f"\n🚀 New File ID: {manager.previous_file_id}")

    # Create and run the assistant
    assistant = client.beta.assistants.create(
        name="FAST Autonomous GPT4 - Node Image Data - Data Analysis",
        description="This assistant is specialized in extracting Blender node data from images.",
        instructions=instructions,
        model="gpt-4o",
        tools=[{"type": "code_interpreter"}],

    )
    

    
    data_thread = client.beta.threads.create()
    active_thread_id_9 = data_thread.id
    

    client.beta.threads.messages.create(
        thread_id=data_thread.id,
        role="user",
        content=[
            {
                "type": "text",
                "text": user_message_content
            },
            {
                "type": "image_file",
                "image_file": {
                    "file_id": new_file_id,
                    "detail": "high",
                }
            },
        ]
    )

    data_run = client.beta.threads.runs.create(
        thread_id=data_thread.id,
        assistant_id=assistant.id,
        temperature=0.7,
    )
    global_run_id_9 = data_run
    
    while True:
        time.sleep(1)
        data_run = client.beta.threads.runs.retrieve(thread_id=data_thread.id, run_id=data_run.id)
        
        end_time = datetime.now() + timedelta(seconds=45)

        while datetime.now() < end_time:
            time.sleep(1)
            data_run = client.beta.threads.runs.retrieve(thread_id=data_thread.id, run_id=data_run.id)
            
            if data_run.status == 'completed':
                stop_spinner()

                prompt, comp, tokens = pr(data_run.id, data_thread.id, 2)

                set_token_properties(prompt, comp, tokens, "gpt-4o")
                
                if manager.gpt_show_tokens:
                    print_color("AW", f"\nDATA GPT 4o INPUT TOKENS = ", new_line=False)
                    print_color("AW", f"{prompt}") 
                    print_color("AW", f"\nDATA GPT 4o OUTPUT TOKENS = ", new_line=False)
                    print_color("AW", f"{comp}") 
                    print_color("AW", f"\nDATA GPT 4o TOTAL TOKENS = ", new_line=False)
                    print_color("AW", f"{tokens}") 
                
                stop_spinner()
                stop_spinner()
                messages = client.beta.threads.messages.list(thread_id=data_thread.id)
          
                for message in messages.data:
                    if message.role == 'assistant':
                        result_ready_data = message.content[0].text.value
                        client.beta.threads.delete(thread_id=data_thread.id)
                        return result_ready_data

            elif data_run.status == 'incomplete':
                stop_spinner()
                
                if hasattr(data_run, 'incomplete_details'):
                    print_color("AW", F"\nDATA RUN STATUS: {data_run.status} DETAILS:", data_run.incomplete_details)  # Print details about why the run was incomplete
                    stop_spinner()

                return "False"

            elif data_run.status == 'failed':
                stop_spinner()
                if 'repetitive patterns' in data_run.last_error.message:
                    print_color("AW", f"ERROR DUE TO REPETITIVE PATTERNS IN THE PROMPT:\n\nPROMPT: {cleaned_error}\n\nOPENAI ERROR: {data_run.last_error.message}")
                    return False

                if hasattr(data_run, 'last_error'):
                    print_color("AR", f"Data run failed with error: {data_run.last_error.message}")
                return "Error in processing request"

            elif datetime.now() > end_time:
                stop_spinner()
                print_color("AW", "Run timed out. OpenAI servers are receiving too many requests. Please try again later.")
                return "Data run did not complete in the allotted time. Please try again later."
                    
except Exception as e:
    stop_spinner()
    print_color("AW", f"Failed to complete assistant operation:\nError: {e}")
    manager.gpt_cancel_op = True
    return False

Data:

  1. Node Name: Render Layers

    • Settings:
      • Image: (Output)
      • Alpha: (Output)
      • Depth: (Output)
      • DiffDir: (Output)
      • DiffInd: (Output)
      • DiffCol: (Output)
      • GlossDir: (Output)
      • GlossInd: (Output)
      • GlossCol: (Output)
      • TransDir: (Output)
      • TransInd: (Output)
      • TransCol: (Output)
      • Scene: Scene
      • View Layer: View Layer
  2. Node Name: Denoise (Top)

    • Settings:
      • HDR: (Unchecked)
      • Image: (Input/Output)
      • Normal: (Input)
      • Albedo: (Input)
  3. Node Name: Denoise (Middle)

    • Settings:
      • HDR: (Unchecked)
      • Image: (Input/Output)
      • Normal: (Input)
      • Albedo: (Input)
  4. Node Name: Denoise (Bottom)

    • Settings:
      • HDR: (Unchecked)
      • Image: (Input/Output)
      • Normal: (Input)
      • Albedo: (Input)
  5. Node Name: Composite (Top)

    • Settings:
      • Use Alpha: (Checked)
      • Image: (Input)
      • Alpha: 1.000
      • Z: 1.000
  6. Node Name: Composite (Middle)

    • Settings:
      • Use Alpha: (Checked)
      • Image: (Input)
      • Alpha: 1.000
      • Z: 1.000
  7. Node Name: Composite (Bottom)

    • Settings:
      • Use Alpha: (Checked)
      • Image: (Input)
      • Alpha: 1.000
      • Z: 1.000

Connections:

  1. Render Layers (Image) → Denoise (Top) (Image)
  2. Render Layers (Image) → Denoise (Middle) (Image)
  3. Render Layers (Image) → Denoise (Bottom) (Image)
  4. Render Layers (Normal) → Denoise (Top) (Normal)
  5. Render Layers (Normal) → Denoise (Middle) (Normal)
  6. Render Layers (Normal) → Denoise (Bottom) (Normal)
  7. Render Layers (Albedo) → Denoise (Top) (Albedo)
  8. Render Layers (Albedo) → Denoise (Middle) (Albedo)
  9. Render Layers (Albedo) → Denoise (Bottom) (Albedo)
  10. Denoise (Top) (Image) → Composite (Top) (Image)
  11. Denoise (Middle) (Image) → Composite (Middle) (Image)
  12. Denoise (Bottom) (Image) → Composite (Bottom) (Image)

I believe this is litegraph js and they provide a very easy way to grab all the graph data. I would recommend following their instructions and then passing the JSON data instead of relying on a visual model:

graph.serialize(); 

https://tamats.com/projects/litegraph/doc/classes/LGraph.html#method_serialize

The issue, regarding your question is that the model is not good with handling locations and visual connections.

3 Likes

Thank you very much I’ll go ahead and check into that right away I appreciate the advice.

2 Likes

Hope it works out. Happy coding

1 Like

Thanks again… This has to be 25 characters so just filling it out, sorry it’s late…

1 Like

I was wondering can you give me any hints like a tutorial or something or at least something I could give to GPT that will help me get started using this as far as you know detecting the nodes in the pictures properly