Question about JSONL file for search endpoint

Hi there,

I’m planning to use completion with a prompt based on results from search. The goal is to dynamically construct the completion prompt using the data returned by search like so:

Prompt structure:

Instructions to the engine.
Example 1
Example 2
Example 3
User input field 1 label: user input field 1 value.
User input field 2 label: user input field 2 value.
User input field 3 label: user input field 3 value.
User input field 4 label: user input field 4 value.

Completion by the engine:

Where examples are the documents found in the file by search engine.

Here is how I need the example look like:

Example structure:

User input field 1 label: meta field 1 value.
User input field 2 label: meta field 2 value.
User input field 3 label: meta field 3 value.
User input field 4 label: text found by search engine

Completion by the engine: meta field 4 value

I was wondering if the objects in JSONL file may be structured like this:

Document object in file structure:

  "text": "User input field 4 value",
  "metadata": {
    "meta field 1 label": "User input field 1 value",
    "meta field 2 label": "User input field 2 value",
    "meta field 3 label": "User input field 3 value",
    "meta field 4 label": "Completion by engine previously validated by human"

The goal is to to use file as source of examples to provide to completion engine to improve the generated text and seeing the limit of prompt, use the search engine to find most relevant examples based on user input. Sort of training data for the model.


1 Can documents be structured like that in the file?
2 Will the search engine use metadata as well to find best results?
3 How to build the search query text to provide all the 4 user input fields?
4 How to make the search engine return a set of full documents (text, and metadata) as a response to the query (in docs it looks like it returns the document IDs with metadata, not clear to me, sorry) Edit: found the answer here: OpenAI API
5 Is there any way to limit the search results based on the volume of text (say it can return 4 smaller results, but only 2 results if the text is long)?

Any curl examples related to this would drastically help.

Thank you.

Hi Serge, metadata is not used by the endpoint in any way.

In short, for the above questions, it’s simply important that it’s valid JSON, but the metadata won’t be used.

There’s no way to use the Search endpoint to directly modify the volume of text returned, but you could simply retrieve the relevant output.

If you have labels you want to use, maybe you would benefit from checking out the Classification endpoint.

Ok, so I think I will request say 5 results and then calculate the “available” prompt volume to use the results from top ranked to bottom until I fill the volume.

The labeling of the user input will be used to choose which model will be responsible for content generation (so that the models stay highly targeted and optimized for their purpose) - each model will have its own file of data so no need to introduce labels in the document structure.

Thanks for feedback. Highly appreciated.

1 Like