Assistant API Response is Extremely Slow

Hello,

I am currently using the OpenAI Assistant API with the GPT-4-turbo model, and for the past few days, the response times have been significantly delayed or slow. I have checked other endpoints (e.g., /v1/completions), and they seem to be responding at normal speeds, but the Assistant API continues to experience delays.

I am currently on OpenAI Usage Tier 5, and I’m wondering if this issue is related to a Rate Limit. Could you please clarify whether Tier 5 users are subject to any specific Rate Limits on the Assistant API? When I checked the headers from the completions API, it didn’t appear that there was a Rate Limit in place.

Please let me know if you need any additional information. I look forward to your prompt response.

Thank you.

Tier 5 should be where you aren’t subject to limitations nor getting high-latency models.

Hitting rate limit is simply a shut-off and refusal to run your API request.

We can’t do much about model production speed - you get what you get.


I did a benchmark on the model six days ago. I can compare to now.

For 5 trials of gpt-4-turbo @ 2024-10-21 01:32AM:

Stat Average Cold Minimum Maximum
stream rate Avg: 34.420 Cold: 31.7 Min: 31.7 Max: 39.7
latency (s) Avg: 0.775 Cold: 0.7868 Min: 0.68 Max: 0.9319
total response (s) Avg: 15.713 Cold: 16.8842 Min: 13.6792 Max: 16.8842
total rate Avg: 32.766 Cold: 30.324 Min: 30.324 Max: 37.429
response tokens Avg: 512.000 Cold: 512 Min: 512 Max: 512

The past test was when other models were underperforming.
Then: 27 tokens per second. Right now: 32.8.

The request is approximately 1600 in (cacheable on gpt-4o) to 512 out.


Try-it-yourself Python benchmark code
  • Find and edit the model list, here replaced by just gpt-4-turbo
import openai  # requires pip install openai
import tiktoken  # requires pip install tiktoken
import time
import json

## Your test parameters
trials = 5
max_tokens = 512
prompt = "Write an extensive article about kittens, 30 paragraphs in length."
history = [
    {
        "role": "system",
        "content": [
            {
                "type": "text",
                "text": (
                    "You are a creative author. "
                    "You are powered by an AI model with a 128k context window for responses."
                ),
            }
        ],
    },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": (
                    "Write five different poems about kittens, each with six stanzas."
                ),
            }
        ],
    },
    {
        "role": "assistant",
        "content": [
            {
                "type": "text",
                "text": """
In approaching the task of writing five poems about kittens, each with six stanzas, I wanted to explore a variety of themes, styles, and poetic forms to capture the many facets of these delightful creatures.

1. **Playful Antics**: For the first poem, I aimed for a light-hearted, rhyming style to depict the mischievous and energetic behavior of kittens as they play and explore their surroundings.

2. **The Bond with Humans**: The second poem delves into the emotional connection between kittens and their human companions, highlighting moments of tenderness and trust through a gentle, reflective tone.

3. **A Kitten's First Adventure**: In the third poem, I chose a narrative approach from the kitten's perspective, illustrating the curiosity and wonder of experiencing the world outside for the first time.

4. **Nature's Elegance**: The fourth poem focuses on the grace and beauty of kittens, using vivid imagery and metaphors to compare them to elements of nature, aiming for a more lyrical and contemplative style.

5. **The Nighttime Wanderer**: For the final poem, I explored the mysterious side of kittens as they roam and observe the world at night, employing an atmospheric and slightly whimsical tone.

By varying the subjects and styles, I hope to showcase the multifaceted charm of kittens through these poems.

---

**1. Playful Antics**

*Stanza 1:*

In morning's glow, the kitten wakes,  
With stretching paws and tiny shakes.  
She spies a feather on the floor,  
A new adventure to explore.

*Stanza 2:*

She pounces quick and bats the air,  
Her playful antics without care.  
A tumble here, a somersault,  
The world her stage without a fault.

*Stanza 3:*

The yarn unravels 'cross the room,  
She chases threads upon a loom.  
Her tiny claws catch hold and cling,  
Entangled in the messy string.

*Stanza 4:*

A sudden noise, she stops to stare,  
Then darts beneath the rocking chair.  
Peek-a-boo, she pops her head,  
Then off she scurries under bed.

*Stanza 5:*

She leaps atop the window sill,  
To watch the birds with quiet thrill.  
Her tail flicks fast with great delight,  
Imagining her soaring flight.

*Stanza 6:*

As shadows stretch and day grows dim,  
Her eyelids droop on fading whim.  
Curled cozy in a soft retreat,  
She dreams until again they meet.

---

**2. The Bond with Humans**

*Stanza 1:*

Upon my lap she softly lies,  
A tiny warmth that magnifies.  
Her purring like a gentle hum,  
A silent song without a drum.

*Stanza 2:*

Her eyes meet mine in quiet gaze,  
A trust that's built in countless ways.  
I stroke her fur with tender care,  
A moment pure we simply share.

*Stanza 3:*

She follows close where'er I go,  
A shadow in the evening glow.  
Her presence like a soothing balm,  
A constant source of gentle calm.

*Stanza 4:*

When teardrops fall from troubled eyes,  
She senses all that I disguise.  
With nuzzles soft, she lifts the weight,  
Her simple love can heal and sate.

*Stanza 5:*

In playful times she brings me cheer,  
Her joyful leaps erase all fear.  
A chase, a catch, a happy spin,  
She shows me how to laugh again.

*Stanza 6:*

Our bond is deep, unspoke, profound,  
In her, a loyal friend I've found.  
A kitten's love, a precious part,  
Forever held within my heart.

---

**3. A Kitten's First Adventure**

*Stanza 1:*

The door ajar, a world untold,  
She steps into the morning cold.  
Her whiskers twitch with scents anew,  
A vast domain comes into view.

*Stanza 2:*

The grass beneath her paws is wet,  
A curious and strange vignette.  
She pats a leaf and watches it sway,  
Then scurries off to join the day.

*Stanza 3:*

A butterfly flutters nearby,  
She leaps but cannot reach the sky.  
Undaunted by the fleeting prize,  
She tracks a beetle with her eyes.

*Stanza 4:*

The garden blooms with colors bright,  
A dazzling feast within her sight.  
She sniffs a rose and sneezes twice,  
Surprised by petals soft and nice.

*Stanza 5:*

A babbling brook sings gentle tunes,  
Reflecting rays of sunlit noons.  
She dips a paw and feels the chill,  
Then quickly shakes with sudden thrill.

*Stanza 6:*

As shadows stretch and birds take flight,  
She feels the tug of coming night.  
She turns toward home, her quest complete,  
And finds her bed so soft and sweet.

---

**4. Nature's Elegance**

*Stanza 1:*

Grace embodied in a tiny frame,  
A creature wild yet gently tame.  
She moves with ease through dappled light,  
A silent shadow, swift and slight.

*Stanza 2:*

Her fur a tapestry of hues,  
Reflecting golds and silvery blues.  
Like morning mist or evening shade,  
In her, the artist's touch is laid.

*Stanza 3:*

Her eyes are pools of liquid green,  
Where secrets of the woods are seen.  
They shimmer with an ancient lore,  
A wisdom from the days of yore.

*Stanza 4:*

She climbs the oak with nimble stride,  
To sit where earth and sky collide.  
Above the world, she takes her throne,  
A queen of realms yet unknown.

*Stanza 5:*

The wind whispers through her fur,  
A melody meant just for her.  
She listens to the songs untold,  
Embracing mysteries they hold.

*Stanza 6:*

In her, the wild and tame unite,  
A harmony of dark and light.  
A kitten born of earth and sky,  
Nature's elegance passing by.

---

**5. The Nighttime Wanderer**

*Stanza 1:*

When twilight paints the sky in grey,  
The kitten wakes to start her play.  
The moon her lantern in the dark,  
She ventures out to make her mark.

*Stanza 2:*

She slips into the night's embrace,  
A silent figure full of grace.  
The stars above her guiding lights,  
She dances through the velvet nights.

*Stanza 3:*

The world asleep, she roams alone,  
Each street and shadow hers to own.  
The whispers of the midnight air,  
Are secrets that she longs to share.

*Stanza 4:*

She pauses by the old stone wall,  
To hear the distant night bird's call.  
A harmony of solitude,  
In which her spirit is renewed.

*Stanza 5:*

Reflections in the puddles gleam,  
Distorted ripples of a dream.  
She gazes at the mirrored skies,  
A universe within her eyes.

*Stanza 6:*

As dawn approaches, soft and slow,  
She feels the sunrise's gentle glow.  
Returning home, her wander ceased,  
She curls up tight and sleeps in peace.

---

By exploring different themes and styles, these poems aim to capture the playful, affectionate, curious, elegant, and mysterious aspects of kittens, celebrating their unique place in our lives and hearts.
""".strip(),
            }
        ],
    },
]


models = ['gpt-4o-2024-08-06', 'gpt-4o-2024-05-13']
models = ['gpt-4-turbo']

class Tokenizer:
    def __init__(self, encoder="o200k_base"):
        self.tokenizer = tiktoken.get_encoding(encoder)

    def count(self, text, encoder="o200k_base"):
        self.tokenizer = tiktoken.get_encoding(encoder)
        return len(self.tokenizer.encode(text))

class Tokenizer100:
    def __init__(self, encoder="cl100k_base"):
        self.tokenizer = tiktoken.get_encoding(encoder)

    def count(self, text):
        return len(self.tokenizer.encode(text))

class BotDate:
    def __init__(self):
        self.created_time = time.time()
        self.start_time = 0

    def start(self):
        return time.strftime("%Y-%m-%d %I:%M%p", time.localtime(self.created_time))

    def now(self):
        return time.strftime("%Y-%m-%d %I:%M%p", time.localtime(time.time()))

    def set(self):
        self.start_time = time.time()

    def get(self):
        return round(time.time() - self.start_time, 4)

client = openai.Client(timeout=120, max_retries=0)  # uses OPENAI_API_KEY env variable
bdate = BotDate()
tok = Tokenizer()
latency = 0
stats = {model: {"stream rate": [], "latency (s)": [],"total response (s)": [],
                 "total rate": [],
                 "response tokens": [],} for model in models}

for i in range(trials):  # number of trials per model
    for model in models:
        if model[4:6] == "4o":
            token_encoder = "o200k_base"
        else:
            token_encoder = "cl100k_base"
        bdate.set()  # start timer
        response = None

        if model[-5:] == "instruct"[-5:]:
            # API request from completions
            try:
                response = client.completions.with_raw_response.create(
                    prompt=prompt + "\n\nassistant: ",
                    model=model, top_p=0.0001, stream=True, max_tokens=max_tokens+1)
            except Exception as e:
                print(f"{model}: {e}")
                continue
        else:
            # API request from chat completions
            try:
                response = client.chat.completions.with_raw_response.create(
                    messages=history + [{"role": "user", "content": prompt}],
                    model=model, top_p=0.0001, stream=True, max_tokens=max_tokens,
                    #stream_options={"include_usage": True}
                    )
            except Exception as e:
                print(f"{model}: {e}")
                continue

        q = response.parse()
        print(f"\n{q.__class__.__name__}:{model}", end="")

        reply = ""  # string to collect response tokens
        for chunk_no, chunk in enumerate(q):
            if reply == "":
                latency_s = bdate.get()
            if q.response.is_success and not chunk.choices[0].finish_reason:
                if q.response.url.path.startswith("/v1/chat"):
                    reply += chunk.choices[0].delta.content  # chat chunks
                else:
                    reply += chunk.choices[0].text  # completion chunks
                print(".", end="")  # progress indicator
        print(chunk)
        total_s = bdate.get()  # timer end
        # extend model stats lists with total, latency, tokens for model
        stats[model]["latency (s)"].append(round(latency_s,4))
        stats[model]["total response (s)"].append(round(total_s,4))
        tokens = tok.count(reply, token_encoder)
        stats[model]["response tokens"].append(tokens)
        stats[model]["total rate"].append(round(tokens/total_s, 3))
        stats[model]["stream rate"].append(round((tokens-1)/(1 if (total_s-latency_s) == 0 else (total_s-latency_s)), 1))

print("\n")
for key in stats:
    print(f"### For {trials} trials of {key} @ {bdate.now()}:")
    print("| Stat |  Average | Cold | Minimum | Maximum |")
    print("| --- | --- | --- | --- | --- |")
    for sub_key in stats[key]:
        values = stats[key][sub_key]
        cold = values[0]
        min_value = min(values)
        max_value = max(values)
        avg_value = sum(values) / len(values)
        print(f"| {sub_key} | Avg: {avg_value:.3f} | Cold: {cold} | Min: {min_value} | Max: {max_value} | ")
    print()