Issues creating index from json array

Hi

I’m having issues when i create an index from a json file
I have a json with 5 different registries, something quite simple, just name, address and age. i have created an index with this using llama_index library
When loading the index and querying about the data, like, give me the age of Javi, it answers returning data from another item in the array.

I’m using davinci 003, and I’m pretty new to this so I’m sure I’m doing something wrong, I suspect the issue is that gpt is understanding each item in the json is correlated with the others

What is the right way to create the index in order to make each item independent?

Thanks

It sounds like the model is not able to distinguish between different items in the JSON file and is treating them as correlated.
A suggestion to make each item independent: try creating a separate document for each item in the JSON file, rather than creating one document that contains all the items.

Create a separate document for each item in the JSON file, and each document will have its own name, address, and age fields. This should allow the model to treat each item as independent and give the correct answer when querying the index.

Something like this:

// Create a document for each item in the JSON file
const documents = data.map(item => {
const doc = new Document();
doc.setField("name", item.name);
doc.setField("address", item.address);
doc.setField("age", item.age);
return doc;
});

// Add the documents to the index
await index.addDocuments(documents);

I hope this helps. If you use it, please advise the results.

Thanks. I’m using python for that, llama_index/gpt_index. Do you know what’s the equivalent in those libraries to do that, or should I use pandas instead?

For llama_index or gpt_index libraries in Python to create the index: create a separate document for each item in your JSON file with a list of dictionaries where each dictionary represents a separate document.
Example for llama_index:

...

# List of dictionaries, where each dictionary represents a separate document
documents = [
    {"name": "Alice", "address": "123 Ali st", "age": 23},
    {"name": "Bob", "address": "456 Rob Ave", "age": 34},
    {"name": "Javi", "address": "789 Jaji Rd", "age": 45},
    {"name": "Karl", "address": "1011 Rak St", "age": 54}
]

# Create an index with the list of documents
index = llama_index.Index(documents)
...
# Query the index for the age of Javi
result = index.search("age of Javi")
...

If you prefer to use Pandas: it can read the JSON file as a Pandas DataFrame and then convert each row into a dictionary representing a separate document. It is another example, I can post later if you want.

Tell me what is the best for your case.

Yeah, that’s what I’m doing, and when I query what is the address of javi, it tells me that Alice’s address is blah blah, that’s the confussion

Do you know if the result is displaced by 1 up or 1 down? Or is it random?

Assuming it is 1 up displaced? What happens when query for the 1st one?
Or the inverse for 1 down case?

It returns different results from time to time, it doesn’t follow a pattern

The query for the address of “Javi” returning wrong addresses suggests that there may be a problem with the query itself, rather than the index - check if it works using Llamedos to create an index - like this:

...
# line added:
from llama_index import LlamedosIndex
...
# List of dictionaries, where each dictionary represents a separate document
documents = [
    {"name": "Alice", "address": "123 Ali st", "age": 23},
    {"name": "Bob", "address": "456 Rob Ave", "age": 34},
    {"name": "Javi", "address": "789 Jaji Rd", "age": 45},
    {"name": "Karl", "address": "1011 Rak St", "age": 54}
]

# Create an index with the list of documents - line changed:
index = LlamedosIndex(documents)
...
# Query the index for the age of Javi - line changed:
result = index.query("age of Javi")
...

Please let me know - if it is not working, without seeing your code completely, so I suppose it is time to choose another library like Whoosh or ElasticSearch or to switch entirely to Panda or something else.

Hi

Llamedosindex doesn’t exist in llama_index. What’s the right method to use?

Try LlDocumentIndex instead:

# Create an index with the list of documents
index = LlDocumentIndex(documents)

Are those answers generated with gpt itself? That method doesn’t exist either

Ok. Try this page (I suppose you already did it):
Vector Store Index - LlamaIndex :llama: 0.6.5 (gpt-index.readthedocs.io)
I found this example:

import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTVectorStoreIndex.from_documents(documents)

GPTVectorStoreIndex exists in llama_index, from that page:

class llama_index.indices.vector_store.base.GPTVectorStoreIndex

I am sorry for the previous responses. With your permission I would like to delete them - I was completely misinformed.

I don’t know why they make these classes so difficult, long names, so full of codes and requirements - to make a simple indexing.