thanks for the response but …
I load the data as such:
df = pd.read_csv('luna_skills_copy.csv')
columns = ["name", "skill", "mastery"]
df = df.reindex(columns=columns)
print(f"{len(df)} rows in the data.")
df.sample(5)
as I want to use only those 3 columns, then I compute embeddings so:
def compute_doc_embeddings(df: pd.DataFrame) -> dict[tuple[str, str], list[float]]:
"""
Create an embedding for each row in the dataframe using the OpenAI Embeddings API.
Return a dictionary that maps between each embedding vector and the index of the row that it corresponds to.
"""
return {
idx: get_embedding(r.skill + ' ' + r.mastery) for idx, r in df.iterrows()
}
because I need to check and find people with certain skill and also level sometimes (it works)
but my embedding has first line with only first element missing:
,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60, …
So if I set the index for/with first column:
df.set_index(["name"])
I will get the error:
ValueError: invalid literal for int() with base 10: 'name'