Web implentation and keeping the API key private?

If you have a flask server running a python app which makes API calls there should be no slowdown when that app makes API calls and relays them to the HTML page, that’s how I build my streaming chat bot interfaces all the time. Most of my use cases are with a VPS as the server and not a AWS or other cloud container though, although I only use Azure or Digital Ocean for streaming apps.

How is your server hosted?

Hi Spencer,

It is hosted automatically thru my Azure Web App. Not sure what they are using. I’ll go back and measure the timing at various points and see if I can pinpoint the issue. Its go to know you are successfully using this technique: every streaming chunk is requested by HTML page from flask server. Flash server calls OpenAI API to stream next chunk and then returns that to flask, who returns it to the HTML page – is that correct?



Correct, my Python flask app is both making the API calls and handling the SSE events and serving the web pages, then I have a bit of js in the served html that handles the displaying and concatenation of the streaming deltas.

This is the flask app

import os
import json
import logging
from flask import Flask, render_template, request, Response
import openai

app = Flask(__name__)
app.config['DEBUG'] = True
API_KEY = os.getenv("OPENAI_API_KEY")
openai.api_key = API_KEY


def stream_openai_response(prompt):
    response = openai.ChatCompletion.create(
        logit_bias= {"22046":-100}, 
        temperature= 0.5,
        messages=[{"role": "user", "content": prompt}],
    for event in response:
        yield event

def home():
    return render_template('index.html')

@app.route('/get_response', methods=['POST'])
def get_response():
    user_input = request.form['user_input']
    response = Response(generate(user_input), content_type='text/event-stream', headers={'Cache-Control': 'no-cache'})
    return response

def generate(user_input):
    client = stream_openai_response(user_input)
    separator = chr(31) * 3
    for event in client:
            choice = event['choices'][0]['delta']
            if 'content' in choice:
                yield f"data:{choice['content']}{separator}"
            elif 'role' in choice and choice['role'] == 'assistant':
                yield f"data:{separator}"
        except json.JSONDecodeError as e:
            logging.error(f"JSONDecodeError: {e}")

    # add a newline character to the end of the generated string
    yield f"data:___DONE___{separator}"

def bad_request_error(error):
    logging.error(f"Bad Request Error: {error}")
    return 'Bad Request', 400

if __name__ == '__main__':
    #app.run(host='', port=80)

This is a bit late, but for any future readers - using Vercel’s env vars doesn’t solve the problem, since the js code is running in the browser and therefore visible to anyone who inspects the page.

Serverless functions might have been a solution at one point, but now that OpenAI accepts/returns non-text such as csv’s, img’s and voice, serverless functions are not a viable solution.

I’m searching for an answer myself.

I use Vercel and I created a simple api in javascript which only receives http requests and replies with the api key retriving it from an environment variable.