Structured vs unstructured data with LlamaIndex

Hi,
I’m trying Llamaindex to create data input for GPT from some google docs.
My goal is to have a chatbot that use my knowledge base to give me answers.

This is the code that I’m using:

def construct_index(directory_path):
    max_input_size = 4096
    num_outputs = 512
    max_chunk_overlap = 20
    chunk_size_limit = 600

    prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)

    llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name="gpt-3.5-turbo", max_tokens=num_outputs))

    documents = SimpleDirectoryReader(directory_path).load_data()

    index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor, prompt_helper=prompt_helper)

    index.save_to_disk('index.json')

    return index

I opened the output file to undestrand better what’s going on, I’ve seen the data parsed in a very unusual way… I don’t really comprehend the structure of the data. For example some paragraphs is splitted between different indexes.
I then tried chatgpt with these datas and… it kinda sucked.

So my question here is: would chatgpt perform better if I’m able to structure better the data? For example I think is easier to have data divided by paragraph.
Or maybe use FAQs instead of user manuals, so the answers is shorter than entire paragraphs. In this way I can have each index with the question and the corrisponding answer for example

What do you think?

I have a similar query I just noticed this post after I posed a new topic. Did you every find out any more about structuring your source data?