Custom AI Model with contextual product data in the form for txt, pdf and api documents etc

We intend to create an AI model that possesses contextual knowledge about any given product. The model will be trained using various documents associated with product like requirement documents, Functional Specifications, Technical Design etc. The trained AI model should maintain the context of the product and provide solutions to product-specific queries or problems.

Currently we are not able to train AI model with huge product docs without loosing the context. The LLM’s which we used are splitting the docs into multiple portions and the responses are loosing context due to that. Also we are facing token and character limitation when we wanted to use OpenAI

You might find this post interesting covering various topics one of which is embeddings overlap and
The length of the embedding contents - #23 by curt.kennedy there is also the topic of meta summation and meta headings that can be embedded to enable contextual relevance