APIConnectionError or APIError

,

Hi, I am running the following code:

import os

import openai

import pandas as pd

# Set your OpenAI API key

openai.api_key = "[my API]"

# Load the original dataframe from a CSV file or any other source

df = pd.DataFrame(df)

# Split the dataframe into smaller chunks

chunk_size = 1000 # Number of rows per chunk

chunks = [df[i:i+chunk_size] for i in range(0, len(df), chunk_size)]

# Create an empty list to store the embeddings

embeddings = []

# Process each chunk separately

for chunk in chunks:

# Iterate over the dataframe and create embeddings for each text

for index, row in chunk.iterrows():

response = openai.Embedding.create(input=row['text'], engine='text-embedding-ada-002')

embedding = response['data'][0]['embedding']

embeddings.append(embedding)

@retry(delay=1, backoff=2, max_delay=120)

def failsModeration(prompt: str) -> bool:

return openai.Moderation.create(

input=prompt

)["results"][0]["flagged"]

# Assign the embeddings to the dataframe

df['embeddings'] = embeddings

# Save the dataframe with embeddings to a CSV file

df.to_csv('/Python_script/embeddings.csv', index=False)

The code worked with the small amount of tokens (around 400k), but once I try to complete the embedding with 8mln tokens I cannot complete the process. Each time I get one of the two mistakes:

    1. APIConnectionError: Error communicating with OpenAI: (‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’)
    1. APIError: Bad gateway.

How can I fix it to complete embedding?

Hi @reniaffer

Welcome to the OpenAI community.

This could be the DDoS protection kicking in, because at 8M tokens, you’re just sending so many chunks together in a short amount of time.

Also, what are your rate-limits for the particular embeddings model?

Hi,

I’ve decreased the number of chunks to 200 and I now I received the following error: “Timeout: Request timed out: HTTPSConnectionPool(host=‘api. openai. com’, port=443): Read timed out. (read timeout=600)”.

I use text-embedding-ada-002. The limit is 3000 RPM / 250,000 TPM.

It looks like the API is taking too long to respond.

Could be because of server side issues.

Also, what is the token size per chunk?