How does one scale a chat application for production? Is async necessary?

AVGVSTVS · May 19, 2023, 7:09pm

I’ve been learning to program for 5 months and have built a chat web app with OpenAI’s Python API. I used FastAPI and enabled asynchronous requests via the recently added async acreate method in my most recent iteration. I’m trying to understand how I would scale an application like this for production with a capacity for potentially thousands of users.

I know that I could scale an app by creating multiple instances at deployment, but I wonder if this is the most efficient way because it seems like it could get expensive quickly, perhaps there’s a way to maximize the capacity of each instance. I just to make sure the app is not the bottleneck and I can scale efficiently.

Does enabling async allow more concurrent requests within the same instance? By definition I assume it does, is it essential for production?
How would i test my apps concurrency limits without charging myself a huge amount in API costs?
How many concurrent requests can I expect one instance of my app to be able to handle without bottlenecking and queuing requests?

This is my async generator function:

async def generate(messages: List[Message], model_type: str):
    try:
        response = await openai.ChatCompletion.acreate(
            model=model_type,
            messages=[message.dict() for message in messages],
            stream=True
        )

        async for chunk in response:
            content = chunk['choices'][0]['delta'].get('content', '')
            if content:
                yield content

    except RateLimitError as e:
        yield f"{str(e)}"

N2U · June 5, 2023, 11:06am

I agree with everything ruckus said.

Just want to add a tiny note:
When an asynchronous function is called, it may return a special type of object called a “promise” or a “future.” A promise object represents the eventual completion or failure of an asynchronous operation. It serves as a placeholder for the result that will be available at some point in the future. You can think of it as a container that holds the value or error that will be produced when the asynchronous operation completes. If you see the response promise it’s because your function haven’t been awaited properly

The use of async can lead to race conditions when the order of operations is crucial but not strictly enforced, causing unexpected behavior due to the concurrent execution of these operations. This typically arises when multiple tasks access or modify shared data without synchronization, leading to conflicts and inconsistencies in the final outcome.

arun.ar.work · August 3, 2023, 11:10am

@AVGVSTVS
Can I know the datatype of individual message you are using. As I am getting an error of is

InvalidRequestError: [{‘role’: ‘system’, ‘content’: ‘You are a helpful assistant’}, {‘role’: ‘user’, ‘content’: ‘hi’}] is not of type ‘object’ - ‘messages.0’

Topic		Replies	Views
High User Volume with Async OpenAI API assistants-api	0	1247	December 4, 2023
Asynchronous use of the library API api	3	20550	January 2, 2024
Using Asynchronous Client with AsyncOpenAI API api , assistants-api	12	24267	October 8, 2024
Parallelise calls to the API - is it possible and how? API	13	38112	December 13, 2023
Appropriate way of timing out an asynchronous chat completions stream API api	3	5079	November 3, 2023

How does one scale a chat application for production? Is async necessary?

Related topics