The scope of this question…is the former use-case.
What My Code Looks like
For that use case, my code looks like:
public class OpenAIUtils {
// ...business logic..
public String solveCaptcha(File captchaFile) {
ChatRequest request = ChatRequest.builder()
.model('gpt-4o')
// .message(SystemMessage.of("You have been tasked with reading this image."))
.message(UserMessage.of([
ContentPartText.of("""
You have been tasked with reading this image.
Type out the letters you see in it"""),
ContentPartImageUrl.of(loadImageAsBase64(captchaFile.getPath())),
]))
.build()
return openAI.chatCompletions()
.create(request)
.join()
.firstContent();
}
// ...more business logic...
}
What happens when you run this function, with a valid File ?
In any case, it might be a good idea to get the raw http response you get from OpenAI, if you can’t figure it out yourself you may need to talk to the developer. I’d always suggest running a raw curl request to see what’s what.
{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You have been tasked with reading this image. "
},
{
"role": "user",
"content": [
{
"type" : "text",
"text" : "Type out the letters you see in it, and only respond with the letters."
},
{
"type": "image_url",
"image_url" : {
"url": "${ImageUtils.LoadImageAsBase64(file.getPath())}"
}
}
]
}
]
}
and here is the ImageUtils.LoadImageAsBase64() definition:
final String prefix = "data:image/" + extension + ";base64,";
For safety you should not assume that
it’s an image
the file extension represents the mime type.
Instead you should use a mime_type guesser and pass that directly.
For example. image.jpg IS NOT image/jpg but instead image/jpeg (Yes, it does matter)
Lastly you are returning a class ImageUrl. If you have implemented an implicit conversion to String then ignore this, otherwise that also may be your issue.
Lastly lastly, you shouldn’t be using GPT-4V to read captchas. You can get your account banned for doing this.
But at least this question teaches me a little bit about the OpenAI API and how to use it.
UPDATE: I was able to get it to work by making sure that my util method return String, and by having my raw-request method return response.choices[0].message.content
That being said, I will scrap any more work on the original goal, for reasons stated on here. This does teach me how to use the API for other things…