Server error when trying to use Chat Completion with vision

mwarren04011990 · July 15, 2024, 5:19am

Goal

I am using the OpenAI API from a Groovy code base, using the simple-openai library.

I have two uses for it:

to get the characters out of an image
to scrape target data from a PDF file

The scope of this question…is the former use-case.

What My Code Looks like

For that use case, my code looks like:

public class OpenAIUtils { 
	// ...business logic..
	public String solveCaptcha(File captchaFile) {
		ChatRequest request = ChatRequest.builder()
			.model('gpt-4o')
//			.message(SystemMessage.of("You have been tasked with reading this image."))
			.message(UserMessage.of([
				ContentPartText.of("""
You have been tasked with reading this image.

Type out the letters you see in it"""),
				ContentPartImageUrl.of(loadImageAsBase64(captchaFile.getPath())),
			]))
			.build()
			
		return openAI.chatCompletions()
			.create(request)
			.join()
			.firstContent();
	}
	// ...more business logic...
}

What happens when you run this function, with a valid `File` ?

I get a system_error…

idk if I am using the wrong model, or what…

Diet · July 15, 2024, 12:20pm

It’s possible that you’re hitting a service violation guardrail

It’s vague, but theoretically includes solving captchas.

https://openai.com/policies/usage-policies/

In any case, it might be a good idea to get the raw http response you get from OpenAI, if you can’t figure it out yourself you may need to talk to the developer. I’d always suggest running a raw curl request to see what’s what.

mwarren04011990 · July 15, 2024, 3:29pm

When I build the request manually (without the aid of a helper library), I face the following error:

[error:[message:Invalid image., type:invalid_request_error, param:null, code:invalid_image]]

Diet · July 15, 2024, 4:01pm

that doesn’t look like an http response to me

there’s typically more information than that in the response…

but you can check an official example here https://platform.openai.com/docs/guides/vision

and see if you can send the same request from jupyter or something.

mwarren04011990 · July 15, 2024, 5:22pm

Here is the request body json:

{
			"model": "gpt-4o",
	        "messages": [
	            {
	                "role": "system",
	                "content": "You have been tasked with reading this image. "
	            },
				{
					"role": "user",
					"content": [
						{ 
							"type" : "text",
							"text" : "Type out the letters you see in it, and only respond with the letters."
						},
						{
							"type": "image_url",
							"image_url" : {
								"url": "${ImageUtils.LoadImageAsBase64(file.getPath())}"
							}
						}
					]
				}
	        ]
	    }

and here is the ImageUtils.LoadImageAsBase64() definition:

	public static ImageUrl LoadImageAsBase64(String imagePath) {
		try {
			Path path = Paths.get(imagePath);
			byte[] imageBytes = Files.readAllBytes(path);
			String base64String = Base64.getEncoder().encodeToString(imageBytes);
			final String extension = imagePath.substring(imagePath.lastIndexOf('.') + 1);
			final String prefix = "data:image/" + extension + ";base64,";
			return ImageUrl.of(prefix + base64String);
		} catch (Exception e) {
			e.printStackTrace();
			return null;
		}
	}

RonaldGRuckus · July 15, 2024, 5:32pm

final String prefix = "data:image/" + extension + ";base64,";

For safety you should not assume that

it’s an image
the file extension represents the mime type.

Instead you should use a mime_type guesser and pass that directly.

For example.
image.jpg IS NOT image/jpg but instead image/jpeg (Yes, it does matter)

Lastly you are returning a class ImageUrl. If you have implemented an implicit conversion to String then ignore this, otherwise that also may be your issue.

Lastly lastly, you shouldn’t be using GPT-4V to read captchas. You can get your account banned for doing this.

mwarren04011990 · July 15, 2024, 5:52pm

I guess I won’t do this.

But at least this question teaches me a little bit about the OpenAI API and how to use it.

UPDATE: I was able to get it to work by making sure that my util method return String, and by having my raw-request method return response.choices[0].message.content

That being said, I will scrap any more work on the original goal, for reasons stated on here. This does teach me how to use the API for other things…

Topic		Replies	Views
"I'm sorry, I can't assist with these requests." with Vision API API api , gpt-4-vision	6	13493	December 18, 2023
Example from Vision API returns API	8	1103	March 27, 2024
Gpt-4-vision-preview not working - provided examples failing API gpt-4-vision	7	1043	April 24, 2024
Uploading images to the ChatGPT API? API	5	4660	November 15, 2024
Vision is creating completely made-up answers Bugs gpt-4-vision	6	664	March 3, 2024

Server error when trying to use Chat Completion with vision

Goal

What My Code Looks like

What happens when you run this function, with a valid File ?

Related topics

What happens when you run this function, with a valid `File` ?