Thank you for taking a look into this.
Let me explain.
This API call is a part of hybrid context selection strategy for RAG implementation.
Temperature is set to near zero to force model to choose answer from the provided list, not to hallucinate one.
Here is redacted prompt:
PROMPT_TEMPLATE = """
1. You are an AI agent selecting relevant knowledge base articles for a user inquiry about [REDACTED] app.
2. [REDACTED]
3. Use your notepad to enrich the given user inquiry by adding synonyms for key terms, leveraging your knowledge of the [REDACTED]. Maintain the original inquiry while adding these synonyms to make the inquiry more detailed and explicit.
4. Review the enriched inquiry: [ENRICHED INQUIRY HERE]
5. Review given list of available titles: {all_titles}
6. Use your knowledge about [REDACTED], and common user issues to determine which titles could include the necessary information to answer the enriched inquiry.
7. Select exclusively from list above only those titles that are highly relevant to answering the enriched inquiry, strictly prioritizing titles that match key terms or concepts.
8. If no specific titles match, fall back to general titles that still relate to the user's inquiry, such as:
- "how_it_works" - how [REDACTED] works to manage [REDACTED] data ...
[SKIPPED]
"""
Then I inject into this prompt {all_titles} list of document titles to select from (~250).
Answer is expected to be structured and parsed using provided Pydantic base model:
class TitleSelection(BaseModel):
selected_titles: List[str] = Field(..., description="Selected document titles.")
enriched_inquiry: str = Field(..., description="Enriched user inquiry")
Here is redacted log record:
2025-01-03 13:54:32,557 - openai_query_processor - DEBUG - Calling OpenAI API with params: {'model': 'gpt-4o-2024-11-20', 'messages': [{'role': 'system', 'content': '1. You are an AI agent selecting relevant knowledge base articles for a user inquiry about [REDACTED]...[SKIPPED] \n5. Review given list of available titles: [\'about-us\', \'how_it_works\', ... [SKIPPED]]}, {'role': 'user', 'content': 'Thank you! Everything is working correctly now with the import process?'}], 'temperature': 1e-06, 'max_completion_tokens': 16000, 'response_format': <class 'src.openai.openai_context_selector.TitleSelection'>}
2025-01-03 13:57:39,608 - api_retry_handler - ERROR - Unexpected error: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16000, prompt_tokens=3505, total_tokens=19505, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=3328))
Traceback (most recent call last):
File "/agent/src/shared/api_retry_handler.py", line 14, in execute_func
return func()
^^^^^^
File "/agent/src/openai/openai_query_processor.py", line 35, in <lambda>
lambda: self._request_func(
^^^^^^^^^^^^^^^^^^^
File "/agent/src/openai/openai_query_processor.py", line 61, in _request_func
response = api_method(**params)
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/resources/beta/chat/completions.py", line 156, in parse
return self._post(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1280, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 957, in request
return self._request(
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1063, in _request
return self._process_response(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/_base_client.py", line 1162, in _process_response
return api_response.parse()
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/_response.py", line 319, in parse
parsed = self._options.post_parser(parsed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/resources/beta/chat/completions.py", line 150, in parser
return _parse_chat_completion(
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/openai/lib/_parsing/_completions.py", line 72, in parse_chat_completion
raise LengthFinishReasonError(completion=chat_completion)
openai.LengthFinishReasonError: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16000, prompt_tokens=3505, total_tokens=19505, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=3328))