Hello all,
I’ve been trying to complete the question and answering tutorial Question_answering_using_embeddings.ipynb using my own dataset. The issue is even though I downloaded the example csv file, and copied my own data into the csv file and re-saved it, I cannot get the dataset to run using the code. The code runs perfectly fine if I use the sample data set but when I try to run the sample data set with my data replacing it (even ensuring it is always saved as a csv file), it always errors past line 48. I have tried changing the data types in the columns using python, I’ve tried removing any special characters, I’ve tried removing any special characters, I’ve tokenized by hand using the OpenAI website, I’ve tried, I kid you not, about three days worth of fixes with no luck. ChatGPT is now repeating recommendations without any success unfortunately.
I continually receive this error:
`ValueError Traceback (most recent call last)
Cell In [74], line 1
----> 1 prompt = construct_prompt(
2 “What is a WOC Nurse?”,
3 document_embeddings,
4 df
5 )
7 print(“===\n”, prompt)
Cell In [73], line 16, in construct_prompt(question, context_embeddings, df)
13 document_section = df.loc[section_index]
15 chosen_sections_len += document_section.tokens + separator_len
—> 16 if chosen_sections_len > MAX_SECTION_LEN:
17 break
19 chosen_sections.append(SEPARATOR + document_section.content.replace(“\n”, " "))
File /shared-libs/python3.9/py/lib/python3.9/site-packages/pandas/core/generic.py:1442, in NDFrame.nonzero(self)
1440 @Final
1441 def nonzero(self):
→ 1442 raise ValueError(
1443 f"The truth value of a {type(self).name} is ambiguous. "
1444 “Use a.empty, a.bool(), a.item(), a.any() or a.all().”
1445 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
`
You can see in the dataset found here: https://docs.google.com/spreadsheets/d/e/2PACX-1vSs9Ok5FUrhAOu_BnpLwV63bwpLylRtUWBDE7onAX1zrZW0Sz4gBEtBN-KtsBiC1DhKyhhZjNXfNf0i/pub?output=csv
That if you only use the first chapter, there is no issue, however, anything read past line 48 (it took a lot of trial and error to determine this) it no longer works and I either get the error noted above, or an error stating that the system cannot read the JSON content (I’ve also tried converting the csv file to json, but still obtain the error above).
Unfortunately, I am still quite new to python so any recommendations or assistance with this issue would be much appreciated.
Thank you so much for your assistance!