Better Understand Images / Train On Annotated Images

People seem very interested. I’ll drop an update as soon as I can. Also the reason I asked if we could upload annotated images or something to help it identify damages is because we have some couple terabytes of 360 images of almost every car in good and bad conditions. It’s all high quality photos taken with a Sony A7R3 so if it would be possible to help it understand the damages, we have the data.

Quite the shame it can only process 4 images at a time now, hopefully this will grow over time just as the token context window has.

I share your sentiment on the current image processing limit and the hopeful outlook for scaling capabilities, reminiscent of the improvements we’ve seen with the token context window. However, it’s worth noting that the limitations you mentioned can be somewhat mitigated through an understanding of the tier-based rate limits and the use of parallel asynchronous calls.

At higher tiers, such as Tier 4 which I’m currently on, the rate limits for various models, including those for image processing like DALL-E 2 and DALL-E 3, offer considerable bandwidth. For instance, the gpt-4-vision-preview model allows for 300 requests per minute (RPM) with a daily cap, which means, technically, one could initiate up to 300 image requests simultaneously if their system can handle it and if they stay within the daily limit. This approach significantly increases the amount of data you can process at any given time.

Making parallel asynchronous calls is key here. By leveraging asynchronous programming, you can send out multiple requests at the same time without waiting for each one to complete before sending the next. This can help you maximize your use of the rate limit. Imagine initiating 300 image processing requests all at once — the throughput could be phenomenal, bandwidth permitting.

Of course, this method does hit the cap quickly if you’re not careful, as you’ve pointed out. It’s a powerful strategy but requires mindful management of your requests to avoid exhausting your daily limits too soon. This approach showcases the flexibility and potential scalability within the system’s architecture, even as we look forward to further enhancements in handling capacities.

So while there’s a current limit on how many images can be processed at a time, the tier system and the ability to make parallel asynchronous calls offer a pathway to maximizing what you can achieve within those constraints. It’s a bit like having a wide pipe for data flow; you can push a lot through at once, provided you’re aware of the cap and manage your usage accordingly.

Hope this sheds some light on how to leverage the current system to its fullest!