Question answering with extended number of chunks


From what I understand, the most efficient approach to custom question answering with ChatGPT model is via embeddings-based search (i.e. enriching prompts with chunks retrieved from an embeddings database). Now, how do you tackle questions that cover several chunks of data, or even the whole data? For instance, if every chunk contains the description of one shirt and you ask what the cheapest one is, the approach mentioned above won’t work…

I thought of using tuning in that case with custom prompt/response pairs but model tuning is currently limited to davinci model (with OpenAI APIs). So, what would be the best approach to meet that need while keeping the powerful expressive power of GPT-3.5 model?


Even if you could fit all the shirt information into the context, the model doesn’t necessarily actually find the cheapest one. These models are not general purpose computer programs.

You might have better results by prompting the model with what your schema is, and then asking the model to generate SQL code that will answer the user’s question, and then separately run the SQL code. If you get a syntax error, re-prompt the model to ask it to fix it and try again.

E g, you might do something like:

We have the following SQL table, with information about 10,000 shirts in it:


Create a SQL query to answer the following question about shirts.
Output only the SQL code, no comments or explanation.

{user question goes here}

SQL Code:
1 Like

That’s not how embeddings are typically used.

I recommend you read this -

Your solution is clever, but would require to store all data in the form of SQL tables. Currently, I have product sheets. On top of that, it would seem difficult at first sight to construct a response in proper language from the result of a SQL query. But perhaps it’s just a matter of prompt engineering. So, I’ll keep thinking about your approach.

Now, you said that these models are not naturally trained to perform such tasks, but currently ChatGPT is able to do it. For instance, if you try this prompt:

Here are 3 products, along with their price:
TOTO : price $45
TITI : price $37
TUTU : price $68

What is the cheapest of all?

It will accurately return that the cheapest product is TITI. Such behavior has been induced by fine-tuning, right? My goal is to make it work for potentially thousands of products and of course many more characteristics.

Yeah, I heard about HyDE, but it’s just a way to improve the search approach. It does not answer my question that is to be able to search an answer in potentially all documents.

This is a learner-shot, not fine-tuning. Using a learner-shot is very effective with LLMs, but as you note, token ceilings represent challenges. And even if there aren’t token ceilings, there are practical limits. @jwatte is correct - the approach requires an intermediate layer that performs a transformation of some type. I tend to use aggregations of large data sets to set the table for AGI where analytics are needed.

The suggestion to read about HYDE wasn’t to necessarily use it to craft your solution. It’s to give you more knowledge about embeddings.

You should immediately race over to CustomGPT, get a free trial, load it up with 100 of your product sheets, and then test it. If this produces favorable results, there’s a lot more to discuss if building this on your own IP footing is imperative. But this simple test will validate if your content (without modification) can be effectively utilized by LLMs to create the results you hope to provide users.

I’d be more worried about scale 1000.

Scale 3 works great, because there are so many tutorials and descriptions on the web (and thus in the training data) for how to solve that problem – pattern matching works well. 3 is also well below the number of attention heads the model is using at one time.

Scale 100 might work – IIRC, this is in the same ballpark as number of layers, and number of attention heads, so a straightforward “selection” might still work. (This is very hand wavey! Just order-of-magnitude estimates.)

Scale 1000 is more than the number of layers, and it’s more than the number of attention heads, so I’d expect that to show real misses on current LLMs, even if you can jam it into the context size (which is totally possible on gpt-4-32k, at least!)


My suggestion is to help the inquirer gain more insight, not perfect the production approach. CustomGPT can undoubtedly handle the scale, but can @kevin.guimard shape the content in a way that makes it work and one where he begins to see some daylight without purchasing a paid tier?

OK, I’ll bite: What about the CustomGPT model makes it able to pay attention to 1000 different input facts at the same time, and select the right one?

It looks to me as if it’s a standard embedding-match front-end to GPT-4, which I don’t understand how it would even theoretically be capable of correctly solving that workload.

It’s not a model, but it obviously has some magical embedding sauce to do what it does. Perhaps the CTO @alden can provide more insights.

I’m not offering that as a solution, simply trying to give the questioner some additional pathways to learn and experiment to better understand the possibilities.

We often find tools that can offer us make-vs-buy options. Not everyone is able to build things that are sometimes complicated. And lacking a detailed understanding of the full requirements, we’re all playing a bunch of guessing games.

1 Like

You need to have some sort of ranking algorithm for the chunks returned from the semantic search (e.g. after retrieving from Pinecone). This ranking algorithm is needed not just to send the BEST chunks to ChatGPT (for completion) – but also to do pre-processing like removing duplicates.

At CustomGPT, we ended up building a proprietary ranking algorithm for the chunks in response to all the business requirements we were getting from people (it started with simple duplicate removal and then kept growing from there)

This seems to be working pretty well right now (though there is always room for improvement). Some of the bots being built using our system have tens of thousands of pages/documents covering multi-gig data.

1 Like

The solution suggested by @jwatte with the SQL generation is the correct one IF you want to do analytics like “find the cheapest t’shirt”. Imagine a catalog with 1,000,000 products. If you want to run real analytics based on the structured data (e.g. price), then ChatGPT is not the right choice (unless you do something like what jwatte suggested).

However, if it’s a non-analytics thing (like semantic search with embeddings), then it should work with embeddings and Q&A.


First, the idea for me is to build the system myself. From what I understand, CustomGPT uses the embeddings approach to retrieve the relevant chunks and to inject them into the prompt. But is it the only possible approach? The problem with that approach is that all chunks of text may not fit in the prompt.

The idea of the SQL layer is clever, but the price of shirts was just an example, I’d like to leverage much more complex information (and NLP intensive).

To be clearer, I was wondering whether it was possible to “train” a ChatGPT model on additional corpuses of text so as to be able to naturally ask questions about them. It seems that the API does not allow it.

The embeddings route is the standard way to do this right now. There’s no built-in API functionality for this. An additional layer is to summarize each chunk (and sometimes summarize the summaries) in order to fit within prompt limits, but that’s the best we’ve got for now. Look into map_reduce and other summarization strategies

Ok, that’s clear. Well, I don’t have the choice but go for the embeddings search approach then…