Server error when trying to use Chat Completion with vision

Goal

I am using the OpenAI API from a Groovy code base, using the simple-openai library.

I have two uses for it:

  • to get the characters out of an image
  • to scrape target data from a PDF file

The scope of this question…is the former use-case.

What My Code Looks like

For that use case, my code looks like:

public class OpenAIUtils { 
	// ...business logic..
	public String solveCaptcha(File captchaFile) {
		ChatRequest request = ChatRequest.builder()
			.model('gpt-4o')
//			.message(SystemMessage.of("You have been tasked with reading this image."))
			.message(UserMessage.of([
				ContentPartText.of("""
You have been tasked with reading this image.

Type out the letters you see in it"""),
				ContentPartImageUrl.of(loadImageAsBase64(captchaFile.getPath())),
			]))
			.build()
			
		return openAI.chatCompletions()
			.create(request)
			.join()
			.firstContent();
	}
	// ...more business logic...
}

What happens when you run this function, with a valid File ?

I get a system_error

idk if I am using the wrong model, or what…

It’s possible that you’re hitting a service violation guardrail

It’s vague, but theoretically includes solving captchas.

https://openai.com/policies/usage-policies/

In any case, it might be a good idea to get the raw http response you get from OpenAI, if you can’t figure it out yourself you may need to talk to the developer. I’d always suggest running a raw curl request to see what’s what.

2 Likes

When I build the request manually (without the aid of a helper library), I face the following error:

[error:[message:Invalid image., type:invalid_request_error, param:null, code:invalid_image]]

that doesn’t look like an http response to me :thinking:

there’s typically more information than that in the response…

but you can check an official example here https://platform.openai.com/docs/guides/vision

and see if you can send the same request from jupyter or something.

Here is the request body json:

{
			"model": "gpt-4o",
	        "messages": [
	            {
	                "role": "system",
	                "content": "You have been tasked with reading this image. "
	            },
				{
					"role": "user",
					"content": [
						{ 
							"type" : "text",
							"text" : "Type out the letters you see in it, and only respond with the letters."
						},
						{
							"type": "image_url",
							"image_url" : {
								"url": "${ImageUtils.LoadImageAsBase64(file.getPath())}"
							}
						}
					]
				}
	        ]
	    }

and here is the ImageUtils.LoadImageAsBase64() definition:

	public static ImageUrl LoadImageAsBase64(String imagePath) {
		try {
			Path path = Paths.get(imagePath);
			byte[] imageBytes = Files.readAllBytes(path);
			String base64String = Base64.getEncoder().encodeToString(imageBytes);
			final String extension = imagePath.substring(imagePath.lastIndexOf('.') + 1);
			final String prefix = "data:image/" + extension + ";base64,";
			return ImageUrl.of(prefix + base64String);
		} catch (Exception e) {
			e.printStackTrace();
			return null;
		}
	}
final String prefix = "data:image/" + extension + ";base64,";

For safety you should not assume that

  1. it’s an image
  2. the file extension represents the mime type.

Instead you should use a mime_type guesser and pass that directly.

For example.
image.jpg IS NOT image/jpg but instead image/jpeg (Yes, it does matter)

Lastly you are returning a class ImageUrl. If you have implemented an implicit conversion to String then ignore this, otherwise that also may be your issue.

Lastly lastly, you shouldn’t be using GPT-4V to read captchas. You can get your account banned for doing this.

I guess I won’t do this.

But at least this question teaches me a little bit about the OpenAI API and how to use it.

UPDATE: I was able to get it to work by making sure that my util method return String, and by having my raw-request method return response.choices[0].message.content

That being said, I will scrap any more work on the original goal, for reasons stated on here. This does teach me how to use the API for other things…