I’ve developed my own chunking strategy for a similar case:
Grouping Q&A Pairs by Topic
I organize related Q&A pairs into groups based on their topic, ensuring that each group is cohesive and focused.
Using a Moving Window for Chunks
I combine 2–3 Q&A pairs into a single chunk using a sliding window approach, but I’m careful not to mix Q&A pairs from different topics. This keeps each chunk relevant and logically structured.
Adding Context as a Prefix
To provide clarity and improve usability, I include a contextual introduction as a prefix to each chunk.
Appending Each Chunk with Its Topic
I finalize each chunk by appending the corresponding topic, creating a clear and complete unit of information.
As the OpenAI VectorStore’s chunking (currently) is limited to a simple sliding window with basically static size/overlap and not doing any semantic boundary finding or such, I think I will stick to one Q+A per file as when creating the Q+As I know what belongs together and what constitutes one semantic group whereas the sliding window might very well group together stuff that makes little sense to be embedded together.
I would love for the OpenAI Assistant/VectorStore API to have the ability to (optionally) provide the embedded and retrieved contents separately.
The Q&As are IMHO a typical use case for RAG and also a good example where embedding one thing (the Q) but retrieving into the context another thing (the Q plus the A) can be very beneficial.
We have a similar setup using Assistants. Our conclusion is that as long as we aren’t hitting the 10,000 file limit on the vector store, then we just put our small things into separate files and that all seems to work nicely. In fact, it’s one way to sort of get the semantic boundaries that you are looking for. I don’t see a particular advantage of bundling multiple documents into the same file unless you are running into that file limit.
Maybe you ran into this too: Do you know of a way to tell when the vector store is done embedding uploaded files? I had situations where I performed an inference and it clearly was not using the uploaded data and when repeating a while later it was. So I’m looking for a way to reliably tell, whether an uploaded file really was fully embedded and is ready to be used by the Assistant.
We tried fetching the file after uploading and assumed that the state would not be “active” until after it had been processed. But we had a similar experience to you – where there was some delay after that before the assistant was taking advantage of all of the uploaded content. And we didn’t find a reliable way to know when that process was finished. In our application, this isn’t that serious – because we can tolerate this. So we just live with it. I can understand why that might not be the case for you.