Saving Embeddings

Hi all,

I’ve been following this tutorial:

I’ve finally got my dataset to work and a calculate embeddings using document_embeddings = compute_doc_embeddings(df). Using this code works great and the remainder of the code functions without issue. However, no matter how I try to save the embeddings, when I try load the csv file with the saved embeddings using document_embeddings = load_embeddings(“”) I without fail receive an error.

Any recommendations on how to save the csv file once the embeddings are recalculated?


I store the embedding in a database, with a structure similar to this (at the moment):

 create_table "embeddings", force: :cascade do |t|
    t.string "openai_id"
    t.string "model"
    t.string "prompt"
    t.string "vector"
    t.string "userid"
    t.datetime "created_at", precision: 6, null: false
    t.datetime "updated_at", precision: 6, null: false

Hope this helps.

It works for me when I load it locally with the supplied load_embeddings function (it took a minute or two though).

It could be something like it timing out (it is a large file and took a while to download here).

Or it could be a memory issue. I would try it locally if you can, or with less data to check it is working. If you are on Google Drive or something, you can copy it local there to speed things up.

You can try to only download 10 lines first, by adding an nrows parameter:

df = pd.read_csv(fname, header=0, nrows=10)

I think that should stream the first 10 without needing to load the whole file. For the full file you will still be better off having it local though.