Hi, I am running the following code:
import os
import openai
import pandas as pd
# Set your OpenAI API key
openai.api_key = "[my API]"
# Load the original dataframe from a CSV file or any other source
df = pd.DataFrame(df)
# Split the dataframe into smaller chunks
chunk_size = 1000 # Number of rows per chunk
chunks = [df[i:i+chunk_size] for i in range(0, len(df), chunk_size)]
# Create an empty list to store the embeddings
embeddings = []
# Process each chunk separately
for chunk in chunks:
# Iterate over the dataframe and create embeddings for each text
for index, row in chunk.iterrows():
response = openai.Embedding.create(input=row['text'], engine='text-embedding-ada-002')
embedding = response['data'][0]['embedding']
embeddings.append(embedding)
@retry(delay=1, backoff=2, max_delay=120)
def failsModeration(prompt: str) -> bool:
return openai.Moderation.create(
input=prompt
)["results"][0]["flagged"]
# Assign the embeddings to the dataframe
df['embeddings'] = embeddings
# Save the dataframe with embeddings to a CSV file
df.to_csv('/Python_script/embeddings.csv', index=False)
The code worked with the small amount of tokens (around 400k), but once I try to complete the embedding with 8mln tokens I cannot complete the process. Each time I get one of the two mistakes:
-
- APIConnectionError: Error communicating with OpenAI: (‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’)
-
- APIError: Bad gateway.
How can I fix it to complete embedding?