The following code:
embedding_model = “text-embedding-ada-002”
embedding_encoding = “cl100k_base” # this the encoding for text-embedding-ada-002
max_tokens = 8000 # the maximum for text-embedding-ada-002 is 8191
top_n = 1000
input_datapath = “data/pm_sources.csv”
df = pd.read_csv(input_datapath)
df = df[[“title”, “authors”, “abstract”]]
df = df.dropna()
df[“combined”] = (
"Title: " + df.title.str.strip() + "; Abstract: " + df.abstract.str.strip()
)
encoding = tiktoken.get_encoding(embedding_encoding)
omit reviews that are too long to embed
df[“n_tokens”] = df.combined.apply(lambda x: len(encoding.encode(x)))
df = df[df.n_tokens <= max_tokens].tail(top_n)
print(len(df))
df[“embedding”] = df.combined.apply(lambda x: float(get_embedding(x, engine=embedding_model)))
df.to_csv(“data/pm_embeddings.csv”)
Produces the following warning message and does not create embeddings for my input data: raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x16aeba830 state=finished raised AuthenticationError>]
I need to create embeddings to continue in my project however I do not understand how to fix these tenacity errors. Any help someone could provide would be much appreciated.