`FileSearch` conversation fails in a loop?

Background

I have implemented, thanks to simple-openai library, an OpenAI util method for parsing court case complaint PDF.

I wrote a Test Case against it, run it multiple times, individually, and it passes every time I run it.

Ok, what’s the problem?!

However, when I call this Test Case in a wait-loop:

for (int i = 0; i < 10; i++) { 
	WebUI.callTestCase(findTestCase("Test Cases/Unit Tests/Parse PDF File"), null)
	WebUI.delay(5);
}

…it fails after a few iterations, and the fail is because we somehow don’t get a response text back.

Console logs

Here is the console info that I see after running it:

 2024-07-17 23:37:50.842 INFO c.k.katalon.core.main.TestCaseExecutor - -------------------- 2024-07-17 23:37:50.844 INFO c.k.katalon.core.main.TestCaseExecutor - START Test Cases/Unit Tests/Parse PDF File Consistent data
2024-07-17 23:37:51.164 DEBUG testcase.Parse PDF File Consistent data - 1: for ([i = 0, i < 10, (i++)])
2024-07-17 23:37:51.167 DEBUG testcase.Parse PDF File Consistent data - 1: callTestCase(findTestCase("Test Cases/Unit Tests/Parse PDF File"), null)
2024-07-17 23:37:51.326 INFO c.k.katalon.core.main.TestCaseExecutor - --------------------
2024-07-17 23:37:51.326 INFO c.k.katalon.core.main.TestCaseExecutor - CALL Test Cases/Unit Tests/Parse PDF File
2024-07-17 23:37:51.419 DEBUG testcase.Parse PDF File - 1: foreclosureCaseModel = GetInstance().parseCourtCasePdf(new java.io.File($WebDriverUtils.GetDownloadDirectory()/ECF Complaint.pdf.pdf))
2024-07-17 23:37:53.106 INFO com.kms.katalon.core.util.KeywordUtil - Thread created with id: thread_7PDKrxEgus9Rc6uj9CFy0Xch
=====>> Thread Run: id=run_ICKoSlz9bmfDVuY3tHhm5U7R, status=QUEUED
Based on the extracted information from the provided PDF, here is the data in JSON format:

``` { "caseNumber": "49D01-2404-MF-018805", "owner1Name": "Terra Property QOZ Fund III LLC", "owner2Name": "Yohan Naraine", "mailingAddress": "504 Main Street, Beech Grove, Indiana 46107", "propertyAddress": "3133-3135 Sutherland Avenue, Indianapolis, Indiana 46205" } ```
This information was extracted from the following sections of the document:

- Case Number: "49D01-2404-MF-018805"【4:1†source】
- Owner 1 Name: "Terra Property QOZ Fund III LLC"【4:1†source】
- Owner 2 Name: "Yohan Naraine"【4:1†source】
- Mailing Address: "504 Main Street, Beech Grove, Indiana 46107"【4:1†source】
- Property Address: "3133-3135 Sutherland Avenue, Indianapolis, Indiana 46205"【4:0†source】【4:1†source】
=====>> Thread Run: id=run_ICKoSlz9bmfDVuY3tHhm5U7R, status=COMPLETED
2024-07-17 23:38:08.184 DEBUG testcase.Parse PDF File - 2: assert foreclosureCaseModel.getOwnerFirstName() == "Terra Property QOZ Fund III LLC"
2024-07-17 23:38:08.187 DEBUG testcase.Parse PDF File - 3: assert foreclosureCaseModel.getOwnerLastName()!= "LLC"
2024-07-17 23:38:08.189 DEBUG testcase.Parse PDF File - 4: assert getPropertyAddress().getAddress() == "3133-3135 Sutherland Avenue"
2024-07-17 23:38:08.193 INFO c.k.katalon.core.main.TestCaseExecutor - END CALL Test Cases/Unit Tests/Parse PDF File
2024-07-17 23:38:08.193 INFO c.k.katalon.core.main.TestCaseExecutor - --------------------
2024-07-17 23:38:08.200 DEBUG testcase.Parse PDF File Consistent data - 2: delay(5)
2024-07-17 23:38:13.236 DEBUG testcase.Parse PDF File Consistent data - 1: callTestCase(findTestCase("Test Cases/Unit Tests/Parse PDF File"), null)
2024-07-17 23:38:13.310 INFO c.k.katalon.core.main.TestCaseExecutor - --------------------
2024-07-17 23:38:13.310 INFO c.k.katalon.core.main.TestCaseExecutor - CALL Test Cases/Unit Tests/Parse PDF File
2024-07-17 23:38:13.327 DEBUG testcase.Parse PDF File - 1: foreclosureCaseModel = GetInstance().parseCourtCasePdf(new java.io.File($WebDriverUtils.GetDownloadDirectory()/ECF Complaint.pdf.pdf))
=====>> Thread Run: id=run_sPzEKlWWug3PNvT34otPqOre, status=QUEUED
Based on the extracted information from the provided PDF, here is the data in JSON format:

``` { "caseNumber": "49D01-2404-MF-018805", "owner1Name": "Terra Property QOZ Fund III LLC", "owner2Name": "Yohan Naraine", "mailingAddress": "504 Main Street, Beech Grove, Indiana 46107", "propertyAddress": "3133-3135 Sutherland Avenue, Indianapolis, Indiana 46205" } ```
This information was extracted from the following sections of the document:

- Case Number: "49D01-2404-MF-018805"【8:0†source】
- Owner 1 Name: "Terra Property QOZ Fund III LLC"【8:1†source】
- Owner 2 Name: "Yohan Naraine"【8:1†source】
- Mailing Address: "504 Main Street, Beech Grove, Indiana 46107"【8:1†source】
- Property Address: "3133-3135 Sutherland Avenue, Indianapolis, Indiana 46205"【8:0†source】【8:1†source】
=====>> Thread Run: id=run_sPzEKlWWug3PNvT34otPqOre, status=COMPLETED
2024-07-17 23:38:23.636 DEBUG testcase.Parse PDF File - 2: assert foreclosureCaseModel.getOwnerFirstName() == "Terra Property QOZ Fund III LLC"
2024-07-17 23:38:23.637 DEBUG testcase.Parse PDF File - 3: assert foreclosureCaseModel.getOwnerLastName()!= "LLC"
2024-07-17 23:38:23.638 DEBUG testcase.Parse PDF File - 4: assert getPropertyAddress().getAddress() == "3133-3135 Sutherland Avenue"
2024-07-17 23:38:23.638 INFO c.k.katalon.core.main.TestCaseExecutor - END CALL Test Cases/Unit Tests/Parse PDF File
2024-07-17 23:38:23.638 INFO c.k.katalon.core.main.TestCaseExecutor - --------------------
2024-07-17 23:38:23.639 DEBUG testcase.Parse PDF File Consistent data - 2: delay(5)
2024-07-17 23:38:28.646 DEBUG testcase.Parse PDF File Consistent data - 1: callTestCase(findTestCase("Test Cases/Unit Tests/Parse PDF File"), null)
2024-07-17 23:38:28.710 INFO c.k.katalon.core.main.TestCaseExecutor - --------------------
2024-07-17 23:38:28.710 INFO c.k.katalon.core.main.TestCaseExecutor - CALL Test Cases/Unit Tests/Parse PDF File
2024-07-17 23:38:28.728 DEBUG testcase.Parse PDF File - 1: foreclosureCaseModel = GetInstance().parseCourtCasePdf(new java.io.File($WebDriverUtils.GetDownloadDirectory()/ECF Complaint.pdf.pdf))
=====>> Thread Run: id=run_CfXrjkCnbZl4KyLR2VEqY9oM, status=QUEUED
Based on the extracted information from the provided PDF, here is the data in JSON format:

``` { "caseNumber": "49D01-2404-MF-018805", "owner1Name": "Terra Property QOZ Fund III LLC", "owner2Name": "Yohan Naraine", "mailingAddress": "504 Main Street, Beech Grove, Indiana 46107", "propertyAddress": "3133-3135 Sutherland Avenue, Indianapolis, Indiana 46205" } ```
This information was extracted from the following sections of the document:

- Case Number: "49D01-2404-MF-018805"【12:0†source】
- Owner 1 Name: "Terra Property QOZ Fund III LLC"【12:0†source】
- Owner 2 Name: "Yohan Naraine"【12:0†source】
- Mailing Address: "504 Main Street, Beech Grove, Indiana 46107"【12:0†source】
- Property Address: "3133-3135 Sutherland Avenue, Indianapolis, Indiana 46205"【12:0†source】【12:0†source】
=====>> Thread Run: id=run_CfXrjkCnbZl4KyLR2VEqY9oM, status=COMPLETED
2024-07-17 23:38:51.293 DEBUG testcase.Parse PDF File - 2: assert foreclosureCaseModel.getOwnerFirstName() == "Terra Property QOZ Fund III LLC"
2024-07-17 23:38:51.296 DEBUG testcase.Parse PDF File - 3: assert foreclosureCaseModel.getOwnerLastName()!= "LLC"
2024-07-17 23:38:51.297 DEBUG testcase.Parse PDF File - 4: assert getPropertyAddress().getAddress() == "3133-3135 Sutherland Avenue"
2024-07-17 23:38:51.299 INFO c.k.katalon.core.main.TestCaseExecutor - END CALL Test Cases/Unit Tests/Parse PDF File
2024-07-17 23:38:51.299 INFO c.k.katalon.core.main.TestCaseExecutor - --------------------
2024-07-17 23:38:51.300 DEBUG testcase.Parse PDF File Consistent data - 2: delay(5)
2024-07-17 23:38:56.318 DEBUG testcase.Parse PDF File Consistent data - 1: callTestCase(findTestCase("Test Cases/Unit Tests/Parse PDF File"), null)
2024-07-17 23:38:56.382 INFO c.k.katalon.core.main.TestCaseExecutor - -------------------- 2024-07-17 23:38:56.382 INFO c.k.katalon.core.main.TestCaseExecutor - CALL Test Cases/Unit Tests/Parse PDF File 2024-07-17 23:38:56.394 DEBUG testcase.Parse PDF File - 1: foreclosureCaseModel = GetInstance().parseCourtCasePdf(new java.io.File($WebDriverUtils.GetDownloadDirectory()/ECF Complaint.pdf.pdf)) =====>> Thread Run: id=run_gXiRstIxDqYR6rz3FLuFlv6M, status=QUEUED 2024-07-17 23:39:06.855 ERROR c.k.katalon.core.main.TestCaseExecutor - ❌ Test Cases/Unit Tests/Parse PDF File FAILED. Reason: java.lang.Exception: We got a response back, that doesn't contain the JSON: '' at me.mikewarren.myCaseScraper.utils.openAI.OpenAIUtils.parseCourtCasePdf(OpenAIUtils.groovy:86) at Parse PDF File.run(Parse PDF File:8)

What does your code look like:

The relevant OpenAIUtils methods look like this:

	public ForeclosureCaseModel parseCourtCasePdf(File pdfFile) {
		final String responseText = this.sendConversationMessage(pdfConversationHelper, pdfFile);
		if (responseText.indexOf("```json") == -1)
			throw new Exception("We got a response back, that doesn't contain the JSON:\n\n'${responseText}'")
			
		return ForeclosureCaseModel.FromJSON(responseText.substring(responseText.indexOf("{\n"),
			responseText.indexOf("```\n")));
	}

	public String sendConversationMessage(BaseConversationHelper conversationHelper, File file) {
		final String threadId = conversationHelper.getThread().getId(),
		assistantId = conversationHelper.getAssistant().getId();

		openAI.threadMessages()
				.create(threadId, ThreadMessageRequest.builder()
				.role(ThreadMessageRole.USER)
				.content(conversationHelper.getContent())
				.attachment(Attachment.builder()
					.fileId(this.uploadFile(file))
						.tool(AttachmentTool.FILE_SEARCH)
						.build())
					.build())
				.join();

		return this.handleRunEvents(openAI.threadRuns()
				.createStream(threadId, ThreadRunRequest.builder()
					.assistantId(assistantId)
					.build())
				.join(),
				conversationHelper);
	}

	public String uploadFile(File file) {
		return openAI.files()
				.create(FileRequest.builder()
					.file(Paths.get(file.getPath()))
					.purpose(PurposeType.ASSISTANTS)
					.build())
				.join()
				.getId();
	}

	/**
	 * SOURCE: https://github.com/sashirestela/simple-openai/blob/main/src/demo/java/io/github/sashirestela/openai/demo/ConversationV2Demo.java#L120
	 * @param runStream
	 */
	private String handleRunEvents(Stream<Event> runStream, BaseConversationHelper conversationHelper) {
		String responseText = "";

		runStream.forEach({ event ->
			switch (event.getName()) {
				case EventName.THREAD_RUN_CREATED:
				case EventName.THREAD_RUN_COMPLETED:
				case EventName.THREAD_RUN_REQUIRES_ACTION:
					ThreadRun run = (ThreadRun) event.getData();
					System.out.println("=====>> Thread Run: id=" + run.getId() + ", status=" + run.getStatus());
					if (run.getStatus().equals(RunStatus.REQUIRES_ACTION)) {
						FunctionExecutor functionExecutor = conversationHelper.getFunctionExecutor();

						if (functionExecutor == null)
							throw new IllegalStateException("Somehow, we have a run event that is in the 'REQUIRES_ACTION' state, but no function executor to use for it!");

						def toolCalls = run.getRequiredAction().getSubmitToolOutputs().getToolCalls();
						def toolOutputs = functionExecutor.executeAll(toolCalls, { toolCallId, result ->
							ToolOutput.builder()
									.toolCallId(toolCallId)
									.output(result)
									.build()
						});
						def runSubmitToolStream = openAI.threadRuns()
								.submitToolOutputStream(conversationHelper.getThread().getId(), run.getId(), ThreadRunSubmitOutputRequest.builder()
								.toolOutputs(toolOutputs)
								.stream(true)
								.build())
								.join();
						handleRunEvents(runSubmitToolStream, conversationHelper);
					}
					break;
				case EventName.THREAD_MESSAGE_DELTA:
					ThreadMessageDelta msgDelta = (ThreadMessageDelta) event.getData();
					def content = msgDelta.getDelta().getContent().get(0);
					if (content instanceof ContentPartTextAnnotation) {
						ContentPartTextAnnotation textContent = (ContentPartTextAnnotation) content;
						final String textValue = textContent.getText().getValue();
						responseText += textValue;
						print textValue;
					}
					break;
				case EventName.THREAD_MESSAGE_COMPLETED:
					System.out.println();
					break;
				default:
					break;
			}
		});

		return responseText;
	}

and the BaseConversationHelper looks like:

public abstract class BaseConversationHelper {
	protected SimpleOpenAI openAI;
	protected String assistantName, assistantInstructions;
	protected Assistant assistant;
	protected VectorStore vectorStore;
	protected Thread thread;

	public BaseConversationHelper() {
		super();
	}

	public BaseConversationHelper(SimpleOpenAI openAI, String assistantName, String assistantInstructions) {
		super();
		this.openAI = openAI;
		this.assistantName = assistantName;
		this.assistantInstructions = assistantInstructions;
	}

	public Assistant getAssistant() {
		if (this.assistant == null) { 
			Assistants assistants = openAI.assistants();
			
			Assistant existingAssistant = assistants
				.getList()
				.get()
				.find { Assistant assistant -> return assistant.getName() == this.assistantName };
				
			this.assistant = existingAssistant;
					
			if (existingAssistant == null) { 
				this.assistant = assistants
					.create(this.createRequest())
					.join();
				
				KeywordUtil.logInfo("Assistant was created with id: ${this.assistant.getId()} and name '${this.assistant.getName()}'");
			}
		}

		return this.assistant;
	}

	public AssistantRequest createRequest() {
		AssistantRequestBuilder builder = AssistantRequest.builder()
				.name(this.assistantName)
				.model("gpt-4o")
				.instructions(this.assistantInstructions)
				.tool(AssistantTool.fileSearch())
					.toolResources(ToolResourceFull.builder()
							.fileSearch(FileSearch.builder().vectorStoreId(this.vectorStore.getId()).build())
							.build())
				.temperature(0);
					

		if (this.functionExecutor != null)
			builder.tools(functionExecutor.getToolFunctions());

		return builder.build();
	}

	public Thread getThread() {
		if (this.thread == null) { 
			this.thread = openAI.threads()
				.create(ThreadRequest.builder().build())
				.join();
			
			KeywordUtil.logInfo("Thread created with id: ${this.thread.getId()}")
		}

		return this.thread;
	}

	public FunctionExecutor getFunctionExecutor() {
		return null;
	}
	
	public abstract String getContent();
}


Why am I getting empty response after the third/fourth iteration??

UPDATE

Even if I turn the streaming off in my OpenAIUtils.sendConversationMessage() util method:

	public String sendConversationMessage(BaseConversationHelper conversationHelper, File file) {
		final String threadId = conversationHelper.getThread().getId(),
		assistantId = conversationHelper.getAssistant().getId();

		openAI.threadMessages()
				.create(threadId, ThreadMessageRequest.builder()
				.role(ThreadMessageRole.USER)
				.content(conversationHelper.getContent())
				.attachment(Attachment.builder()
					.fileId(this.uploadFile(file))
						.tool(AttachmentTool.FILE_SEARCH)
						.build())
					.build())
				.join();

		ThreadRun threadRun = openAI.threadRuns()
			.createAndPoll(threadId, ThreadRunRequest.builder()
					.assistantId(assistantId)
					.build())
					
		Page<ThreadMessage> messages = openAI.threadMessages()
			.getList(threadId, PageRequest.builder().build(), threadRun.getId())
			.join();
			
		if (messages.isEmpty())
			throw new Exception("Somehow, our request got back a response with no messages!")
			
		ContentPart contentPart = messages.first()
			.getContent().first();
			
		return ((ContentPart.ContentPartTextAnnotation)contentPart).getText().getValue();
			
			
//		return this.handleRunEvents(openAI.threadRuns()
//				.createStream(threadId, ThreadRunRequest.builder()
//					.assistantId(assistantId)
//					.build())
//				.join(),
//				conversationHelper);
	}

I still face the issue!

It seems like we end up with a thread that gets queued, but never completes.

@sashirestela

Hi @mwarren04011990

If you are seeing an issue with threads, I would build the SimpleOpenAI class with a non default HttpClient component in order to pass a custom thread executor:

var httpClient = HttpClient.newBuilder()
    .executor(Executors.newFixedThreadPool(10))  // or another executor
    .build();

var openAI = SimpleOpenAI.builder()
    .apiKey(System.getenv("OPENAI_API_KEY"))
    .httpClient(httpClient)
    .build();

As I mentioned in another post, it is exciting to know that simple-openai helps colleagues like you in their AI projects. Being said that, I invite you to use the simple-openai’s Github repository as well to put your questions/issues/ideas, so you could contribute in that way.

Best regards

I was able to find the issue by logging the threadRun:


threadRun == ThreadRun(id=run_6MrMxPZCpXAkEPHx5ktw6iwN, object=thread.run, createdAt=1721335426, threadId=thread_vuFs62QpGgJgTifx1zUAJiAL, assistantId=asst_55z285xZHyao8CQvVm30RsVI, status=FAILED, requiredAction=null, lastError=LastError(code=RATE_LIMIT_EXCEEDED, message=Rate limit reached for gpt-4o in organization org-pMh1hYZPSUh25nsBMqkgzORF on tokens per min (TPM): Limit 30000, Used 9819, Requested 21492. Please try again in 2.622s. Visit https://platform.openai.com/account/rate-limits to learn more.), expiresAt=null, startedAt=1721335427, cancelledAt=null, failedAt=1721335434, completedAt=null, incompleteDetails=null, model=gpt-4o, instructions=
You are a court case examiner. 

You have been tasked with pulling property owner(s) names and mailing addresses, as well as the address of the property that they have Foreclosure Case on, from a complaint PDF.

Extract those from this PDF, and output the data in JSON format that looks like the following: 

{
  caseNumber,
  owner1Name,
  owner2Name,
  mailingAddress,
  propertyAddress,
}

The case number takes the form '49xxx-yymm-MF-xxxxxx'
, tools=[Tool(type=FILE_SEARCH, function=null)], metadata={}, usage=Usage(promptTokens=1418, completionTokens=18, totalTokens=1436), temperature=0.01, topP=1.0, maxPromptTokens=null, maxCompletionTokens=null, truncationStrategy=TruncationStrategy(type=AUTO, lastMessages=null), toolChoice=auto, parallelToolCalls=true, responseFormat=auto)

Issue seems to be a rate limit error, that I need to find some workaround for…

1 Like