openai.APIConnectionError: Connection error

Hi, im facing a confusing problem
i have a flask application that uses openai api with gpt models (3.5 & 4)

example of a simple util function that i use :

def ask_gpt(system_message, user_message, model):
    """
    Generates a response from the GPT model based on the provided system and user messages.

    Args:
        system_message (str): The initial system message that sets the context.
        user_message (str): The user's query or message to which the GPT model responds.
        model (str): The specific GPT model to be used for generating the response.

    Returns:
        json: A JSON formatted string containing the GPT model's response along with
              input and output token information, and the model used.
    """

    # Store system and user messages in a list for providing context to the model
    messages = [
        {"role": "system", "content": system_message},
        {"role": "user", "content": user_message}
    ]

    # Generate a response from the GPT model
    response = client.chat.completions.create(
        model=model,
        messages=messages
    )

    # Construct a JSON response with key details from the GPT model's response
    json_response = {
        "message_resp": response.choices[0].message.content,
        "input_tokens": response.usage.prompt_tokens,
        "output_tokens": response.usage.completion_tokens,
        "model": response.model,
    }

    return json.dumps(json_response, indent=2)

locally everything works but when i deploy my application to kubernetes i get the error
openai.APIConnectionError: Connection error.

full error :

> 2024-02-19 16:40:15 ERROR: Exception on /<route path> [POST]
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 67, in map_httpcore_exceptions
>     yield
>   File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 231, in handle_request
>     resp = self._pool.handle_request(req)
>   File "/usr/local/lib/python3.9/site-packages/httpcore/_sync/connection_pool.py", line 216, in handle_request
>     raise exc from None
>   File "/usr/local/lib/python3.9/site-packages/httpcore/_sync/connection_pool.py", line 196, in handle_request
>     response = connection.handle_request(
>   File "/usr/local/lib/python3.9/site-packages/httpcore/_sync/connection.py", line 99, in handle_request
>     raise exc
>   File "/usr/local/lib/python3.9/site-packages/httpcore/_sync/connection.py", line 76, in handle_request
>     stream = self._connect(request)
>   File "/usr/local/lib/python3.9/site-packages/httpcore/_sync/connection.py", line 122, in _connect
>     stream = self._network_backend.connect_tcp(**kwargs)
>   File "/usr/local/lib/python3.9/site-packages/httpcore/_backends/sync.py", line 213, in connect_tcp
>     sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
>   File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
>     self.gen.throw(typ, value, traceback)
>   File "/usr/local/lib/python3.9/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
>     raise to_exc(exc) from exc
> httpcore.ConnectError: [Errno -3] Lookup timed out
> 
> The above exception was the direct cause of the following exception:
> 
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.9/site-packages/openai/_base_client.py", line 918, in _request
>     response = self._client.send(
>   File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 915, in send
>     response = self._send_handling_auth(
>   File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 943, in _send_handling_auth
>     response = self._send_handling_redirects(
>   File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 980, in _send_handling_redirects
>     response = self._send_single_request(request)
>   File "/usr/local/lib/python3.9/site-packages/httpx/_client.py", line 1016, in _send_single_request
>     response = transport.handle_request(request)
>   File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 231, in handle_request
>     resp = self._pool.handle_request(req)
>   File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
>     self.gen.throw(typ, value, traceback)
>   File "/usr/local/lib/python3.9/site-packages/httpx/_transports/default.py", line 84, in map_httpcore_exceptions
>     raise mapped_exc(message) from exc
> httpx.ConnectError: [Errno -3] Lookup timed out
> 
> The above exception was the direct cause of the following exception:
> 
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1463, in wsgi_app
>     response = self.full_dispatch_request()
>   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 872, in full_dispatch_request
>     rv = self.handle_user_exception(e)
>   File "/usr/local/lib/python3.9/site-packages/flask_cors/extension.py", line 176, in wrapped_function
>     return cors_after_request(app.make_response(f(*args, **kwargs)))
>   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 870, in full_dispatch_request
>     rv = self.dispatch_request()
>   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 855, in dispatch_request
>     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
>   File "/<path>/routes.py", line 90, in s_code_get
>     service_code = get_service_code(service_name, service_codes)
>   File "/<path>/prompts.py", line 27, in get_service_code
>     <var>= json.loads(ask_gpt(system_message, service, 'gpt-3.5-turbo-1106'))
>   File "/<path>/openai.py", line 28, in ask_gpt
>     response = client.chat.completions.create(
>   File "/usr/local/lib/python3.9/site-packages/openai/_utils/_utils.py", line 275, in wrapper
>     return func(*args, **kwargs)
>   File "/usr/local/lib/python3.9/site-packages/openai/resources/chat/completions.py", line 663, in create
>     return self._post(
>   File "/usr/local/lib/python3.9/site-packages/openai/_base_client.py", line 1200, in post
>     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
>   File "/usr/local/lib/python3.9/site-packages/openai/_base_client.py", line 889, in request
>     return self._request(
>   File "/usr/local/lib/python3.9/site-packages/openai/_base_client.py", line 942, in _request
>     return self._retry_request(
>   File "/usr/local/lib/python3.9/site-packages/openai/_base_client.py", line 1013, in _retry_request
>     return self._request(
>   File "/usr/local/lib/python3.9/site-packages/openai/_base_client.py", line 942, in _request
>     return self._retry_request(
>   File "/usr/local/lib/python3.9/site-packages/openai/_base_client.py", line 1013, in _retry_request
>     return self._request(
>   File "/usr/local/lib/python3.9/site-packages/openai/_base_client.py", line 952, in _request
>     raise APIConnectionError(request=request) from err
> openai.APIConnectionError: Connection error.

everything was working until it wasn’t (no changes were made )
OpenAI library: latest (i had 1.5 changed to 1.12 both not working)

Troubleshooting:

  • tough it could be a Kubernetes ingress problem tried with svc nodePort (same problem)

  • executed a test script from inside the pod and apparently it workes

  • Tested network resolution with nslookup (it resolves)

  • changed python version and openai version

  • restarted the cluster

the confusing part is that it works locally and on the k8s machine and also from inside the pod (or container ) but not when sending request to the svc or ingress

K8s networking:

  • Calico
  • private cluster with NAT
1 Like

After some troubleshooting the problem was caused by dnspython and gunicorn eventlet
my flask applicatoin couldn’t resolve hostnames

a quick fix is by downgrading dnspython version

  • wokred : 2.4.2
1 Like