When are 400 errors thrown by the Answers endpoint?

I think I know the answer to this question but just want to confirm. It seems the 400 behavior is different when using a file vs the documents parameter when calling the Answers endpoint.

For example, when I use a file and ask a question with an answer that can’t be derived from the documents the API throws a 400 error. However, if I make the same request, with the same question, and the same documents, but provide the documents using the documents parameter, I don’t get a 400 error. Rather, I get an answer that could be way outside the scope of my documents.

My assumption is that this is because the “two-step search procedure” is being used with the file and not the documents sent via the documents parameter.

Can someone confirm or clarify this for me?

Hi @stevet!

I think you basically nailed it. That said, it sounds like there might be a few things going on here. I see two possible cases:

(1, most likely) When a user sends a request to /v1/answers with the file parameter, we’ll try and filter down the number of possible results down to max_rerank. Right now that filtering process is keyword based so it’s decently easy for nothing to come back from this step. In that case, you should get a 400 error back with a message to the effect of “no documents found” or something like that.

If you submit a request with the documents parameter set, we assume you want those examples reranked and so we don’t do a filtering step. We just immediately rerank them.

(2) That 404 is bugging me a little bit. It’s not quite the status code I’d expect for a file that didn’t return results. Can you post the error message you’re getting and maybe the api call you’re making?

Thanks for the quick reply, @hallacy! The error is actually a 400 error, not a 404 (my bad). So the error message is being included in the response payload. Here is the response payload I’m getting.

  error: {
    code: null,
    message: "No similar documents were found in file with ID 'file-oi6E5kMlnYSb0Zx05WmSKJSX'.Please upload more documents or adjust your query.",
    param: null,
    type: 'invalid_request_error'

It can be recreated using the example data from the documentation page which is what I used when I originally noted the behavior. Here are the contents of the .jsonl

{"text": "puppy A is happy", "metadata": "emotional state of puppy A"}
{"text": "puppy B is sad", "metadata": "emotional state of puppy B"}

Lastly, here are the request parameters I used.

  "file": "file-oi6E5kMlnYSb0Zx05WmSKJSX",
  "question": "How old are you?", 
  "search_model": "ada", 
  "model": "curie", 
  "examples_context": "In 2017, U.S. life expectancy was 78.6 years.", 
  "examples": [["What is human life expectancy in the United States?", "78 years."]],
  "max_rerank": 10,
  "max_tokens": 5,
  "stop": ["\n", "<|endoftext|>"],
  "return_metadata": true,
  "return_prompt": true

It makes sense to me now. So, basically, if there isn’t a keyword match, you’ll get a 400 error. Correct?

Bingo. We’re working on updating that filtering step to make it more robust in the future. If you have ideas I’m all ears!

Thanks again. I don’t think it’s bad now - I mostly just wanted to understand it. The keyword filtering seems like a good way to filter down the number of possible results. That said, it might be nice if there was a way to “seed” a keyword list to create a broader context if needed. So, not just from the question. Maybe another parameter with an array of keywords. Does that make sense?

Oh interesting! I hadn’t thought of that. And that way you can expand the search list but it won’t interfere with the query?

Exactly. Also, with that, you could run your own classification task for the question to match pre-defined keywords that could then be used for the request to the Answers endpoint.

1 Like