Concurrent API Interaction with a Large Number of Images

I don’t understand programming, and my system is entirely built by me describing the requirements, with GPT helping me complete all the code work.

I have already completed the development of the linear system, and it runs properly.

Now I want to transform it into a concurrent system, but I’m encountering this error, and after repeated debugging, I can’t get it to work. I hope someone can help me figure out how to complete the process of transmitting a large number of images using URLs, base64 encoding, and sending them concurrently to GPT.

I have sent the program’s runtime feedback to GPT, and it mentioned that three components are involved. I have uploaded all of them. Many thanks to the forum members for your help.

first_round_interaction.py

# -*- coding: utf-8 -*-
from logger_config import setup_logger
from image_processor import imageBase64
from schema_validator import validate_extracted_content
import os

logger = setup_logger(__name__)


class FirstRoundInteraction:
    def __init__(self, model_factory, prompt_manager, data_access_manager, schema, config):
        self.model_factory = model_factory
        self.prompt_manager = prompt_manager
        self.data_access_manager = data_access_manager
        self.schema = schema
        self.config = config 

    def process(self, image_path, student_id, image_uuid, image_filename):
        try:
            if self.data_access_manager.is_image_processed(image_filename, student_id):
                logger.info(f"Image {image_filename} for student {student_id} already processed, skipping.")
                return {'success': True, 'item': image_filename, 'status': 'skipped'}

 
            image_data_url = imageBase64(image_path)


            prompt, prompt_name = self.prompt_manager.get_prompt_for_round(1)


            model = self.model_factory.get_model('first_round')


            response = model.generate_response(prompt, image_data_url)

            if validate_extracted_content(response, self.schema):
                for problem in response['problems']:
                    problem_uuid = self.data_access_manager.store_problem_data(problem, image_filename, image_path,
                                                                               student_id, image_uuid)
                    self.data_access_manager.record_interaction(problem_uuid, 1, prompt_name, model.model_name)

                self.data_access_manager.mark_image_as_processed(image_filename, student_id)
                return {'success': True, 'item': image_filename, 'status': 'processed'}
            else:
                return {'success': False, 'item': image_filename, 'status': 'invalid_response'}
        except Exception as e:
            logger.error(f"Error processing image {image_path}: {str(e)}", exc_info=True)
            return {'success': False, 'item': image_filename, 'status': 'error', 'error': str(e)}

model_factory.py

# -*- coding: utf-8 -*-
import openai
import json
from logger_config import setup_logger
from tenacity import retry, stop_after_attempt, wait_exponential

logger = setup_logger(__name__)

def get_image_mime_type(self, image_path):
    extension = image_path.lower().split('.')[-1]
    mime_types = {
        'jpg': 'image/jpeg',
        'jpeg': 'image/jpeg',
        'png': 'image/png',
        'gif': 'image/gif',
        'bmp': 'image/bmp',
        'webp': 'image/webp',
        'tiff': 'image/tiff',
        'svg': 'image/svg+xml'
    }
    return mime_types.get(extension, 'application/octet-stream')

class GPT4oModel:
    def __init__(self, api_key, base_url, model_name, max_tokens, temperature, top_p):
        self.client = openai.OpenAI(api_key=api_key, base_url=base_url)
        self.model_name = model_name
        self.max_tokens = max_tokens
        self.temperature = temperature
        self.top_p = top_p

    @retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=4, max=60))
    def generate_response(self, prompt, input_data=None, is_image=False):
        try:
            if is_image and input_data and input_data.startswith('data:image'):
                messages = [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": prompt
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": input_data
                                }
                            }
                        ]
                    }
                ]
                full_prompt = prompt
            else:
                full_prompt = prompt + "\n\nInput Data:\n" + json.dumps(input_data, ensure_ascii=False, indent=2)
                messages = [
                    {
                        "role": "user",
                        "content": full_prompt
                    }
                ]

            logger.info(f"Prompt sent to GPT (first 50 chars): {full_prompt[:50]}")

            response = self.client.chat.completions.create(
                model=self.model_name,
                messages=messages,
                max_tokens=self.max_tokens,
                temperature=self.temperature,
                top_p=self.top_p,
                response_format={"type": "json_object"},
                timeout=30
            )

            if hasattr(response, 'choices') and len(response.choices) > 0:
                content = response.choices[0].message.content
                return json.loads(content)
            else:
                raise ValueError("Unexpected response structure")
        except openai.APIError as e:
            logger.error(f"OpenAI API error: {str(e)}", exc_info=True)
            raise
        except openai.APIConnectionError as e:
            logger.error(f"OpenAI API connection error: {str(e)}", exc_info=True)
            raise
        except openai.RateLimitError as e:
            logger.error(f"OpenAI API rate limit error: {str(e)}", exc_info=True)
            raise
        except Exception as e:
            logger.error(f"Unexpected error generating response: {str(e)}", exc_info=True)
            raise

class ModelFactory:
    def __init__(self, config):
        self.config = config

    def get_model(self, subject_area):
        model_config = self.config['model_prompt_mapping'].get(subject_area.lower(),
                                                               self.config['model_prompt_mapping']['other'])
        model_name = model_config['model']
        api_key = self.config['api']['key']

        return GPT4oModel(
            api_key=api_key,
            base_url=self.config['api']['base_url'],
            model_name=model_name,
            max_tokens=self.config['gpt']['max_tokens'],
            temperature=self.config['gpt']['temperature'],
            top_p=self.config['gpt']['top_p']
        )

image_processor.py

# -*- coding: utf-8 -*-
import base64
import os
import logging
import shutil
from logger_config import setup_logger, log_exception
import mimetypes
logger = setup_logger(__name__)

def imageBase64(path_of_image):
    try:
        with open(path_of_image, "rb") as image_file:
            encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
            mime_type, _ = mimetypes.guess_type(path_of_image)
            if mime_type is None:
                mime_type = 'application/octet-stream'
            data_url = f"data:{mime_type};base64,{encoded_string}"
            logger.info(f"Successfully encoded image: {path_of_image}")
            return data_url
    except Exception as e:
        log_exception(logger, e)
        raise

def is_image_processed(image_name, process_log_file):
    try:
        if not os.path.exists(process_log_file):
            logger.info(f"Process log file not found: {process_log_file}")
            return False
        with open(process_log_file, 'r') as file:
            processed_images = file.read().splitlines()
        is_processed = image_name in processed_images
        logger.info(f"Image {image_name} processed status: {is_processed}")
        return is_processed
    except Exception as e:
        log_exception(logger, e)
        return False

def log_processed_image(image_name, process_log_file):
    try:
        os.makedirs(os.path.dirname(process_log_file), exist_ok=True)
        with open(process_log_file, 'a') as file:
            file.write(image_name + '\n')
        logger.info(f"Logged processed image: {image_name}")
    except Exception as e:
        log_exception(logger, e)
        raise

def move_processed_image(image_path, processed_dir):
    try:
        if not os.path.exists(processed_dir):
            os.makedirs(processed_dir)

        image_name = os.path.basename(image_path)
        destination = os.path.join(processed_dir, image_name)
        shutil.move(image_path, destination)
        logger.info(f"Moved processed image to: {destination}")
    except Exception as e:
        log_exception(logger, e)
        raise

Return after running the main program

INFO:model_factory:Prompt sent to GPT (first 50 chars): Extract the content from the image, focusing on th
INFO:model_factory:Prompt sent to GPT (first 50 chars): Extract the content from the image, focusing on th
ERROR:model_factory:OpenAI API error: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 129650 tokens. Please reduce the length of the messages. (request id: 2024090820385878799855776406962)", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}
Traceback (most recent call last):
  File "E:\SS-learning-system\system-step-by-step\system-py-one-by-one\image-info\pythonProject\model_factory.py", line 64, in generate_response
    response = self.client.chat.completions.create(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hong\anaconda3\Lib\site-packages\openai\_utils\_utils.py", line 274, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hong\anaconda3\Lib\site-packages\openai\resources\chat\completions.py", line 668, in create
    return self._post(
           ^^^^^^^^^^^
  File "C:\Users\hong\anaconda3\Lib\site-packages\openai\_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\hong\anaconda3\Lib\site-packages\openai\_base_client.py", line 936, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "C:\Users\hong\anaconda3\Lib\site-packages\openai\_base_client.py", line 1040, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 128000 tokens. However, your messages resulted in 129650 tokens. Please reduce the length of the messages. (request id: 2024090820385878799855776406962)", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}}

I hope someone can tell me what knowledge I need to let GPT learn, and then modify the current system. My problem right now is that I don’t know how to guide GPT to learn and improve

Interesting idea to post that programmers knowledge is worthless and in the same post asking them for help.

To get code from it the best way is to explain to it how it should behave. Treat it like an intern.

You got to learn it and then explain to the intern how you want it.

Not the other way around. Take a course, watch a youtube video, google it.
Like a boss.

Maybe one more trick is to ask chatgpt for a course outline of the specific thing you want to do…

and then ask for a lesson (describe the style e.g. a conversation where you ask a question and it should give additional background info)

1 Like

This is really great ! Asking someone “how to fish” rather than “give me a fish”. Here’s a conversation that I had with my avatar. (https://youtu.be/7MMxkEYXOQY)

The basic issue was that I had to update my avatar about BatchAPIs I did so after I realized that it did not really understand BatchAPI. So updated it’s knowledge.

hth

1 Like

There’s no doubt that programming skills are valuable, but I haven’t learned them. So, I use GPT to make up for my shortcomings.

In fact, that’s exactly what I’m doing. I first describe the problem I want to solve, then let GPT provide a solution outline, and then have it program according to the outline. The reason I’ve been able to advance the project up to this point is entirely thanks to GPT’s help, as I haven’t had any formal programming training.

Hello,
If you want to do complex coding without mastering the code or the processes/architecture to be involved, you have to trust chatgpt to guide you but from experience I can tell you that it is not at all a good solution because this can lead you to dead ends and then to debug it is almost mission impossible especially since you do not have the necessary knowledge and I doubt that any programmer would want to get stuck into it…
So as I tell my students, the more solid foundations you have in programming, the better you will be able to move forward quickly by using chatgpt and GUIDING it on what you want to do, checking whether the code provided is understandable and acceptable. In short, you must always keep control! My experience on this here:

1 Like

I have been studying this phenomenon of coding without coding for some time now. I really commend @hongyhbs on his effort. But more than that, the issue really is : how much time should one dedicate to learn how to program versus learn how to instruct LLMs to get what you want.

I would content that LLMs are only going to improve from now onwards. We all deal with abstractions all the time. For example, I may know python; but do i really understand the python interpreter? NO. Do I know the bits and bytes that drive the actual code when I write print("hello world")? NO. So will it be necessary in the foreseeable future to learning a programming language (as it exists today)?

Based on experiences like @hongyhbs , I would say that programming is a talent which will be valued less-n-less in the foreseeable future.

3 Likes

However, understanding AI capabilities must also follow if you want to augment - for example, understanding AI model inability to fully learn to implement on an API or SDK even with 20000 tokens of documentation, then writing code that might place images into textual context, running up the context length over the limit with base64 data instead of formatting messages for the image properly to be encoded as tokens, as is evidenced in the API errors earlier.

AI can be a learning tool, and it can be an accelerator for producing within the domain of skills you possess, but it cannot be trusted.

3 Likes

base/
├── README.md                         
├── INSTALLATION.md                   
├── entrypoints/                      
│   ├── create-local-env.sh           
│   ├── create-local-env.bat          
├── infra/                            
│   ├── helm/                         
│   │   ├── api_gateway/              
│   │   ├── rabbitmq/                 
│   │   ├── neo4j/                    
│   │   ├── postgis/                  
│   │   ├── minio/                    
│   │   ├── vault/                    
│   │   ├── prometheus/               
│   │   ├── grafana/                  
│   │   └── values.yaml               
│   ├── pulumi/                       
│   ├── monitoring/                   
│   └── scripts/                      
├── agentm/                           
│   ├── src/                          
│   │   ├── api_gateway/              
│   │   │   ├── gateway.py            
│   │   │   ├── transformers/         
│   │   │   │   ├── pdf_transformer.py
│   │   │   │   ├── csv_transformer.py
│   │   │   │   ├── xlsx_transformer.py
│   │   │   │   ├── video_transformer.py
│   │   │   │   ├── image_transformer.py
│   │   │   │   ├── catalog_transformer.py
│   │   │   │   ├── json_transformer.py
│   │   │   │   ├── geojson_transformer.py
│   │   │   │   └── dicom_transformer.py
│   │   │   ├── task_handler.py       
│   │   │   ├── bot_interaction.py    
│   │   │   ├── reward_system.py      
│   │   │   └── github_integration.py 
│   │   ├── workflows/                
│   │   │   ├── workflow_manager.py   
│   │   │   ├── human_interaction.py  
│   │   │   ├── human_classification.py
│   │   │   ├── task_reviewer.py      
│   │   │   └── workflow_viewer.py    
│   │   ├── reward_system/            
│   │   │   ├── reward_manager.py     
│   │   │   ├── human_rewards.py      
│   │   │   └── metrics_tracker.py    
│   │   ├── core/                     
│   │   │   ├── memory_manager.py          # Manages short-term, long-term, and fantasy memory
│   │   │   ├── dream_algorithm.py         # Uses memories for creating new associations and ideas during "sleep"
│   │   │   ├── hunger_motivation.py       # Adjusts system behavior based on hunger, tiredness, etc.
│   │   │   ├── task_scheduler.py          # Schedules and prioritizes tasks for the system
│   │   │   ├── prompt_generator.py        # Generates GPT-based prompts
│   │   │   ├── prompt_builder.py          # Builds prompts for agent and task generation
│   │   │   ├── demon_process.py           # Ever-running daemon responsible for background learning
│   │   │   ├── long_term_memory.py        # Manages and updates the system's long-term memory
│   │   │   ├── short_term_memory.py       # Stores short-term information for immediate use
│   │   │   ├── fantasy_memory.py          # Creates "fantasy" memories based on creative mutations
│   │   │   └── logging_config.py          # Configures logging for all processes
│   │   ├── agents/                   
│   │   │   ├── micro_agents/         
│   │   │   │   ├── sort_list.py          
│   │   │   │   ├── filter_list.py        
│   │   │   │   ├── classify_list.py      
│   │   │   │   ├── map_list.py           
│   │   │   │   ├── summarize_list.py     
│   │   │   │   ├── grounded_answer.py    
│   │   │   │   ├── chain_of_thought.py   
│   │   │   │   ├── data_collection.py    
│   │   │   │   ├── preprocessing.py      
│   │   │   │   ├── feature_engineering.py
│   │   │   │   ├── model_training.py     
│   │   │   │   ├── model_evaluation.py   
│   │   │   │   ├── pdf_splitter.py       
│   │   │   │   ├── ocr_tesseract.py      
│   │   │   │   ├── extract_facts.py      
│   │   │   │   └── __init__.py           
│   │   │   ├── capability_agents/       
│   │   │   │   ├── gpt_agent.py          
│   │   │   │   ├── code_generator.py     
│   │   │   │   ├── debugging_agent.py    
│   │   │   │   ├── optimization_agent.py 
│   │   │   │   └── __init__.py           
│   │   ├── machinelearning/          
│   │   │   ├── ml_model_trainer.py   
│   │   │   ├── data_preprocessor.py  
│   │   │   ├── model_evaluator.py    
│   │   │   └── __init__.py           
│   │   ├── deeplearning/             
│   │   │   ├── cnn_training.py       
│   │   │   ├── rnn_training.py       
│   │   │   ├── gan_training.py       
│   │   │   ├── transformer_training.py
│   │   │   └── __init__.py           
│   │   ├── webscraping/              
│   │   │   ├── captcha_solver.py     
│   │   │   ├── crawler.py            
│   │   │   ├── task_scraper.py       
│   │   │   └── experiment_manager.py 
│   ├── tests/                        
│   ├── monitored_work/               
│   └── var/                          
│       ├── data/
│       └── logs/
│           └── agent_execution.log   
├── react-app/                        
│   ├── src/                          
│   │   ├── components/               
│   │   │   ├── videocall/            
│   │   │   ├── chat/                 
│   │   │   ├── task_viewer/          
│   │   │   ├── code_pusher/          
│   │   │   ├── workflow_viewer/      
│   │   └── api/                      
│   │       └── api_gateway.ts        
│   ├── public/                       
│   └── package.json                  
├── monitoring_agent/                 
│   ├── root_rights.py                
│   ├── code_sandbox.py               
│   ├── gatekeeper.py                 
│   └── decision_maker.py

Maybe you can ask the model to build this.
Then you won’t need to code anymore.

1 Like

you are rigth. GPT has done this many times. Previously, I had already found the correct way to use URL for transmission, but it stubbornly changed the correct method to an incorrect direct transfer. My solution is to establish an unmodifiable principle for it. Every time a modification is required, I remind GPT of this principle. This way, the problem can be resolved.

Thank you for your reply. At the same time, the greatness of GPT lies in the fact that it allowed me, someone who had never written a line of code before, to independently complete a system. The linear operation of this system has been successful, but it takes 5 hours to run. Now I need to transform it into a concurrent system. My experience is that humans do the creative thinking. Build the framework of what you want to achieve, and then ask GPT to help you implement it step by step. You can’t rely on an LLM model to do everything at once, but rather, on the foundation of the framework, improve it one component at a time. Then you’ll find that GPT has helped you complete most of the technical work. Of course, excellent technical skills are very valuable. For example, many friends in the forum have helped me solve many technical issues, such as using URLs to transfer base64 encoding, etc. Like any other technology, LLMs develop gradually. We don’t need to wait until the technology is fully mature to start. Overcoming difficulties is precisely the reason for our progress.

thank you very much .I will try it .

I had GPT explain the purpose of this system. I can say that GPT is unable to complete this work all at once. If such a complex task needs to be put together, it exceeds GPT’s current capabilities. However, for me, I need some simple tools or less complicated ones. Therefore, I will break down the functions I need and have GPT complete one part at a time. This way, I can quickly advance my system. Due to different needs, the tasks I ask GPT to do are different. What I need is a reliable and easy-to-use simple system, while yours is a complex and efficient one. Our needs are quite different. In my opinion, it is precisely because of technically skilled and creative people like you that technology advances, allowing ordinary people like me, who are not trained, to benefit from it.