Hi,
my problem, besides that I do not know python, is that I have saved embeddings, looking like:
0,0.0031115561723709106,0.00018902790907304734,-0.00190595886670053,-0.029547588899731636,-0.022286130115389824,0.018968993797898293,-0.029436087235808372,-0.0378822386264801,-0.02245337888598442,-0.018146678805351257,0.02940821275115013,-0.020348811522126198,-0.009881717152893543,-0.008892151527106762 …
with rows of index and vectors, looks ok, I saved this after computing them, if I use the computed ones directly works fine.
But to save time of course I want to load them from file but it does’n work.
If I have the function so:
def load_embeddings(fname: str) -> dict[tuple[str, str], list[float]]:
df = pd.read_csv(fname, header=0)
max_dim = max([int(c) for c in df.columns])
return {
(i): [r[str(i)] for i in range(max_dim + 1)] for _, r in df.iterrows()
}
then I get : ValueError: invalid literal for int() with base 10: ‘Unnamed: 0’
I did try to add after df = pd.read_csv(fname, header=0)
df = df.set_axis(['column1_name', 'column2_name, column3_name]), axis=1)
but then I get:
ValueError: invalid literal for int() with base 10: ‘Unnamed: 0’
I mean I see the embeddings format in the file it has the vectors so indeed it does not seems ok to me to load it this way but how, sorry I do not know python just try
Saved so:
import os
saved_embeddings = "path to file"
if os.path.exists(saved_embeddings) :
document_embeddings = load_embeddings(saved_embeddings)
else :
# document_embeddings = compute_embedding_with_backoff(df=df)
document_embeddings = compute_doc_embeddings(df)
pd.DataFrame(document_embeddings).T.to_csv(saved_embeddings)