Issues creating index from json array

javixeneize · April 24, 2023, 7:42pm

Hi

I’m having issues when i create an index from a json file
I have a json with 5 different registries, something quite simple, just name, address and age. i have created an index with this using llama_index library
When loading the index and querying about the data, like, give me the age of Javi, it answers returning data from another item in the array.

I’m using davinci 003, and I’m pretty new to this so I’m sure I’m doing something wrong, I suspect the issue is that gpt is understanding each item in the json is correlated with the others

What is the right way to create the index in order to make each item independent?

Thanks

AlexDeM · April 24, 2023, 8:10pm

It sounds like the model is not able to distinguish between different items in the JSON file and is treating them as correlated.
A suggestion to make each item independent: try creating a separate document for each item in the JSON file, rather than creating one document that contains all the items.

Create a separate document for each item in the JSON file, and each document will have its own name, address, and age fields. This should allow the model to treat each item as independent and give the correct answer when querying the index.

Something like this:

// Create a document for each item in the JSON file
const documents = data.map(item => {
const doc = new Document();
doc.setField("name", item.name);
doc.setField("address", item.address);
doc.setField("age", item.age);
return doc;
});

// Add the documents to the index
await index.addDocuments(documents);

I hope this helps. If you use it, please advise the results.

javixeneize · April 25, 2023, 5:57am

Thanks. I’m using python for that, llama_index/gpt_index. Do you know what’s the equivalent in those libraries to do that, or should I use pandas instead?

AlexDeM · April 25, 2023, 1:03pm

For llama_index or gpt_index libraries in Python to create the index: create a separate document for each item in your JSON file with a list of dictionaries where each dictionary represents a separate document.
Example for llama_index:

...

# List of dictionaries, where each dictionary represents a separate document
documents = [
    {"name": "Alice", "address": "123 Ali st", "age": 23},
    {"name": "Bob", "address": "456 Rob Ave", "age": 34},
    {"name": "Javi", "address": "789 Jaji Rd", "age": 45},
    {"name": "Karl", "address": "1011 Rak St", "age": 54}
]

# Create an index with the list of documents
index = llama_index.Index(documents)
...
# Query the index for the age of Javi
result = index.search("age of Javi")
...

If you prefer to use Pandas: it can read the JSON file as a Pandas DataFrame and then convert each row into a dictionary representing a separate document. It is another example, I can post later if you want.

Tell me what is the best for your case.

javixeneize · April 25, 2023, 1:23pm

Yeah, that’s what I’m doing, and when I query what is the address of javi, it tells me that Alice’s address is blah blah, that’s the confussion

AlexDeM · April 25, 2023, 3:20pm

Do you know if the result is displaced by 1 up or 1 down? Or is it random?

Assuming it is 1 up displaced? What happens when query for the 1st one?
Or the inverse for 1 down case?

javixeneize · April 25, 2023, 3:42pm

It returns different results from time to time, it doesn’t follow a pattern

AlexDeM · April 25, 2023, 6:36pm

The query for the address of “Javi” returning wrong addresses suggests that there may be a problem with the query itself, rather than the index - check if it works using Llamedos to create an index - like this:

...
# line added:
from llama_index import LlamedosIndex
...
# List of dictionaries, where each dictionary represents a separate document
documents = [
    {"name": "Alice", "address": "123 Ali st", "age": 23},
    {"name": "Bob", "address": "456 Rob Ave", "age": 34},
    {"name": "Javi", "address": "789 Jaji Rd", "age": 45},
    {"name": "Karl", "address": "1011 Rak St", "age": 54}
]

# Create an index with the list of documents - line changed:
index = LlamedosIndex(documents)
...
# Query the index for the age of Javi - line changed:
result = index.query("age of Javi")
...

Please let me know - if it is not working, without seeing your code completely, so I suppose it is time to choose another library like Whoosh or ElasticSearch or to switch entirely to Panda or something else.

javixeneize · May 11, 2023, 8:58am

Hi

Llamedosindex doesn’t exist in llama_index. What’s the right method to use?

AlexDeM · May 11, 2023, 8:21pm

Try LlDocumentIndex instead:

# Create an index with the list of documents
index = LlDocumentIndex(documents)

javixeneize · May 11, 2023, 8:33pm

Are those answers generated with gpt itself? That method doesn’t exist either

AlexDeM · May 11, 2023, 9:21pm

Ok. Try this page (I suppose you already did it):
Vector Store Index - LlamaIndex 0.6.5 (gpt-index.readthedocs.io)
I found this example:

import os
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader('data').load_data()
index = GPTVectorStoreIndex.from_documents(documents)

GPTVectorStoreIndex exists in llama_index, from that page:

class llama_index.indices.vector_store.base.GPTVectorStoreIndex

I am sorry for the previous responses. With your permission I would like to delete them - I was completely misinformed.

I don’t know why they make these classes so difficult, long names, so full of codes and requirements - to make a simple indexing.

Topic		Replies	Views
What am I doing wrong on my semantic search JSON embeded? API	16	2684	February 21, 2024
Document Index Creation Issue API	6	1474	May 3, 2023
Who has had success with adding many/or large documents to the 'Knowledge' section? Plugins / Actions builders gpts , gpt , mygpts	14	6320	January 6, 2024
Obtaining responses other than the trained response API api , text-davinci-003	4	639	September 20, 2023
Creating a support chat bot for my business API	4	2653	December 18, 2023

Issues creating index from json array

Related Topics