I have been trying to upload a file using the assistant api in java, I have tried openai-java and simple-openai and I keep getting different errors, I don’t know if it the api but I cant get it to work, I do the upload with type assistant but when I add it to the message thread I get errors.
For reference I’m upload a selfie and a picture id to validate the identity of the user and then I’m going to pass it to a function to save it into a database.
If anybody has had any luck with uploads any guidance is very much appreciated.
Do you have the error logs to share, usually errors (and logs) are pretty useful to get an idea of what is going on. In your message there are no details about the errors.
The first thing: decide if you are using a file for computer vision, or if you are sending it into code interpreter to be used with the Python environment.
If it is for image recognition with vision, then the uploaded file purpose needs to be “vision”, not assistants.
Then you can construct a vision user message with a text part and an image part using the file ID.
Thread thread =
// TODO: Update this example once we support `.create()` without arguments.
client.beta().threads().create(BetaThreadCreateParams.builder().build());
client.beta()
.threads()
.messages()
.create(BetaThreadMessageCreateParams.builder()
.threadId(thread.id())
.role(BetaThreadMessageCreateParams.Role.USER)
.content("I need to solve the equation `3x + 11 = 14`. Can you help me?")
.build());
You’ll need to replace the content string with the multi-part typed content object. The Java SDK docs don’t have direct examples of this.
I’m upload 2 images one selfie and a picture id, they should be passed as a byte array to a function, the function saves the information collected to a database and runs a validation process
When I upload the image as a string representation of the image I get a run failed Request too large for gpt-4o so uploading multi-part content won’t work.
checked if file uploaded response returns a valid file ID before adding it to the thread?
Maybe it’s an issue with how the file is being referenced in the message payload.
I found a work around, If I don’t upload but use a local file as in the example using chat completions it works
var chatRequest = ChatRequest.builder()
.model(“gpt-4o-mini”)
.messages(List.of(
UserMessage.of(List.of(
ContentPartText.of(
“What do you see in the image? Give in details in no more than 100 words.”),
ContentPartImageUrl.of(ImageUrl.of(
Base64Util.encode(“src/demo/resources/machupicchu.jpg”, MediaType.IMAGE)))))))
.temperature(0.0)
.maxCompletionTokens(500)
.build();
var chatResponse = openAI.chatCompletions().createStream(chatRequest).join();
chatResponse.filter(chatResp → chatResp.getChoices().size() > 0 && chatResp.firstContent() != null)
.map(Chat::firstContent)
.forEach(System.out::print);
But if I try to use an upload file for vison I get the following error:
var chatRequest = ChatRequest.builder()
.model(“gpt-4o-mini”)
.messages(List.of(
ChatMessage.UserMessage.of(List.of(
ContentPart.ContentPartText.of(
“You are a chat assistant to an ai assistant, the response you give will go to the assistant who will then send it back to the user, start your response with {{ The uploaded image contains }}. What do you see in the image? Give in details no more than 30 words.”),
ContentPart.ContentPartImageFile.of(ContentPart.ContentPartImageFile.ImageFile.of(fileid))))))
.temperature(0.0)
.maxCompletionTokens(500)
.build();
var chatResponse = openAI.chatCompletions().createStream(chatRequest).join();
chatResponse.filter(chatResp → chatResp.getChoices().size() > 0 && chatResp.firstContent() != null)
.map(Chat::firstContent)
.forEach(sb::append);
I see that you are mixing completely different things: Chat Completions and Assistants API are different approaches and they shouldn’t be mixed.
Trying to interpret your messages I think that you want to use the Assistants API with the vision feature and uploading images to be compared, right?
If that is the case, I have prepared an example using the simple-openai:
Example Code
package io.github.sashirestela.openai.demo;
import java.nio.file.Paths;
import java.util.List;
import io.github.sashirestela.openai.SimpleOpenAI;
import io.github.sashirestela.openai.common.content.ContentPart.ContentPartImageFile;
import io.github.sashirestela.openai.common.content.ContentPart.ContentPartImageFile.ImageFile;
import io.github.sashirestela.openai.common.content.ContentPart.ContentPartText;
import io.github.sashirestela.openai.common.content.ContentPart.ContentPartTextAnnotation;
import io.github.sashirestela.openai.common.content.ImageDetail;
import io.github.sashirestela.openai.domain.assistant.AssistantRequest;
import io.github.sashirestela.openai.domain.assistant.ThreadMessageDelta;
import io.github.sashirestela.openai.domain.assistant.ThreadMessageRequest;
import io.github.sashirestela.openai.domain.assistant.ThreadMessageRole;
import io.github.sashirestela.openai.domain.assistant.ThreadRunRequest;
import io.github.sashirestela.openai.domain.assistant.events.EventName;
import io.github.sashirestela.openai.domain.file.FileRequest;
import io.github.sashirestela.openai.domain.file.FileRequest.PurposeType;
import io.github.sashirestela.openai.domain.file.FileResponse;
public class Example {
private SimpleOpenAI openAI;
private String fileIdSelfie;
private String fileIdPictureId;
private String assistantId;
private String threadId;
public Example() {
openAI = SimpleOpenAI.builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.build();
}
public void run() {
var question = "Tell me if both images correspond to the same person";
System.out.println("Question: " + question);
fileIdSelfie = uploadFile("src/demo/resources/sam_selfie.jpg", PurposeType.VISION).getId();
fileIdPictureId = uploadFile("src/demo/resources/sam_id.jpg", PurposeType.VISION).getId();
assistantId = openAI.assistants().create(AssistantRequest.builder()
.model("gpt-4o")
.instructions("You are an expert comparing images")
.build())
.join().getId();
threadId = openAI.threads().create()
.join().getId();
openAI.threadMessages().create(threadId, ThreadMessageRequest.builder()
.role(ThreadMessageRole.USER)
.content(List.of(
ContentPartText.of(question),
ContentPartImageFile.of(ImageFile.of(fileIdSelfie, ImageDetail.LOW)),
ContentPartImageFile.of(ImageFile.of(fileIdPictureId, ImageDetail.LOW))))
.build())
.join();
var responseStream = openAI.threadRuns().createStream(threadId, ThreadRunRequest.builder()
.assistantId(assistantId)
.build())
.join();
System.out.print("Answer: ");
responseStream.forEach(e -> {
switch (e.getName()) {
case EventName.THREAD_MESSAGE_DELTA:
var messageDeltaFirstContent = ((ThreadMessageDelta) e.getData()).getDelta().getContent().get(0);
if (messageDeltaFirstContent instanceof ContentPartTextAnnotation) {
System.out.print(((ContentPartTextAnnotation) messageDeltaFirstContent).getText().getValue());
}
break;
default:
break;
}
});
System.out.println();
}
public void clean() {
openAI.files().delete(fileIdSelfie).join();
openAI.files().delete(fileIdPictureId).join();
openAI.assistants().delete(assistantId).join();
openAI.threads().delete(threadId).join();
System.out.println("\nAll resources were deleted");
}
private FileResponse uploadFile(String filePath, PurposeType purpose) {
var fileRequest = FileRequest.builder()
.file(Paths.get(filePath))
.purpose(purpose)
.build();
return openAI.files().create(fileRequest).join();
}
public static void main(String[] args) {
var example = new Example();
example.run();
example.clean();
}
}
Example Output
Question: Tell me if both images correspond to the same person
Answer: Yes, both images correspond to the same person, Sam Altman.
All resources were deleted
Moreover, simple-openai has a demo code for Vision on ThreadMessages:
Thanks for the examples, I’ll try them to see if they work out, let me give you a little context on what I’m doing and how I got it to work.
I got it working with chat completions and the assistant, I know that the APIs shouldn’t mix, let me explain what I’m doing, I have a Yard Management software and I’m currently adding a new feature, using a WhatsApp channel I’m pre registering driver who are going to pickup or deliver, after a series of questions I get the required information from the drivers the selfie and picture of their driver’s license is the las step and after I validate both I add the driver to a database using a function.
How I got it to work, first I get all info using the assistant, when the images are passed from WhatsApp I download the image and upload it to the store as a vison file so I can download it at the end from the function before I save to the database (I pass the file id), and I’m using chat completions to validate the image I get, and the response from the chat completions I pass it into the assistant, and it work perfectly, actually I think it’s better as I don’t have to add more instructions to the assistant and I can define specific tasks to the chat completions and pass them in to the assistant as additional assistant messages.