Mystery model popped up on lmsys gpt2-chatbot - gpt4.5?

mystery model popped up on lmsys, rate limit 8 per day though so be warned

Use the chat link and choose gpt2-chatbot

some great detail here - gpt2-chatbot GPT-4.5 or GPT-5 being tested on LMSYS? | Hacker News as well
Reddit - Dive into anything

Many people calling this gpt4.5


There is more info here - gpt2-chatbot


It makes similar coding error as ChatGPT (gpt-4-turbo), where I tried several models on an apparently not simple task in the linked topic, at a token production rate reminiscient of gpt-4 a year ago. This lmsys AI uses a ChatGPT prompt, so replication of prior input is easy, and set again to top_p:0.1.


Here’s a line of Python code with the intention that just the start and the end of a long string is printed, inserting an elide marker in the middle. However, if the string is instead short, the output shown has overlap of repeated text.


Replace this inline script snippet by writing a version that can never overlap the text, and does not elide if length is under the start + end text length. Use clever pythonic techniques focused on efficiency, not tutorial.


A more efficient and correct approach that avoids text overlap and unnecessary elision for shorter strings can be implemented as follows:

print(f"{transcribed_text[:200]}{'\n…\n' if len(transcribed_text) > 400 else ''}{transcribed_text[-200:] if len(transcribed_text) > 200 else transcribed_text}")


  1. String Slicing and Length Check:

    • The script uses transcribed_text[:200] to capture the first 200 characters of the string.
    • It checks if the total length of transcribed_text exceeds 400 using len(transcribed_text) > 400. If it does, it inserts the elision marker (\n…\n).
    • The end segment transcribed_text[-200:] is conditionally appended only if the length of transcribed_text exceeds 200 characters. This avoids the case where the string length is between 200 and 400, which would otherwise repeat the text or show an irrelevant elision marker.
  2. Efficiency and Pythonic Style:

    • This approach minimizes the operations performed on the string by using length checks to dictate the format.
    • It uses Python’s f-string formatting, which is efficient for creating strings.
    • The conditional expressions inside the f-string ({... if ... else ...}) are concise and execute without requiring additional conditional (if-else) blocks outside the print statement.
  3. Behavior:

    • If len(transcribed_text) <= 200: Prints the text as is.
    • If 201 <= len(transcribed_text) <= 400: Prints the first 200 characters and the last part without any overlap or elision marker.
    • If len(transcribed_text) > 400: Prints the first 200 characters, an elision marker, and the last 200 characters, ensuring there’s no overlap.

This solution is optimal for scenarios where the string length varies significantly, efficiently handling both short and long strings with appropriate formatting.

User (having the AI analyze what it wrote with an approach that made ChatGPT find the error:

Describe the printing behavior, with careful analysis, if transcribed_text were 300 characters in length.


If transcribed_text is 300 characters in length, let’s analyze how the proposed Python line will handle printing:

print(f"{transcribed_text[:200]}{'\n…\n' if len(transcribed_text) > 400 else ''}{transcribed_text[-200:] if len(transcribed_text) > 200 else transcribed_text}")

Here’s the step-by-step explanation of what happens:

  1. First Part (transcribed_text[:200]):

    • This extracts the first 200 characters of transcribed_text.
    • Since the string is 300 characters long, this operation successfully retrieves the first 200 characters.
  2. Conditional Elision Marker ({'\n…\n' if len(transcribed_text) > 400 else ''}):

    • This part checks if the length of transcribed_text is greater than 400 to decide whether to include an elision marker (\n…\n).
    • Given that the string is only 300 characters long, the condition len(transcribed_text) > 400 evaluates to False.
    • Therefore, no elision marker is inserted.
  3. Second Part ({transcribed_text[-200:] if len(transcribed_text) > 200 else transcribed_text}):

    • This extracts the last 200 characters of transcribed_text.
    • Since the string is 300 characters long, transcribed_text[-200:] starts from the 101st character to the end (300th character), effectively capturing the last 200 characters.
    • The condition len(transcribed_text) > 200 is True (because 300 > 200), so it correctly decides to use transcribed_text[-200:].

Final Output Composition:

Given the string length of 300, the output will concatenate:

  • The first 200 characters.
  • Directly followed by the last 200 characters, because there is no elision marker.

Example Analysis:

Let’s assume transcribed_text is “abcdefghij” repeated 30 times, thus making it exactly 300 characters long (since “abcdefghij” is 10 characters, repeated 30 times is 300 characters).

  • transcribed_text[:200] will give us “abcdefghij” repeated 20 times.
  • transcribed_text[-200:] will give us “abcdefghij” starting from the 11th full repetition to the 30th, which is again “abcdefghij” repeated 20 times.



Here, the output essentially prints the first 200 characters followed by the last 200 characters, with no separation or elision, resulting in an evident visual overlap of content from characters 101 to 200 with part of what is repeated as last 200 characters.

This approach mistakenly assumes the last segment needs to be shown only if greater than 200 characters, which isn’t a logical necessity for clarity or unique content display. The overlap in the middle (characters 101-200 being repeated in the printout) highlights a flaw in the logic for handling cases where the string length is between double the desired head/tail lengths (400 in this case) and the individual segment length (200 here). Adjusting the logic to better handle middle-sized strings (like 300 characters) without such redundancy or overlap would involve more nuanced conditional handling or redesign of the character extraction logic.

Again the same behavior is seen as ChatGPT, notably, there is no responsibility, or even awareness or attention, that the code being analyzed was written by the AI immediately before, with a “my bad”.

So if it was “smart” it’s people that already had lowered expectations of a ChatGPT output instead of those that have used gpt-4 at release.


They are likely using API so memory is probably is not a thing with lmsys.

The chat interface of lmsys does indeed have a lossless conversation memory, and one can run it up to a model’s maximum context length to finally get an error.

Without that, the AI would simply have no clue what a user input "Describe the printing behavior, with careful analysis, if transcribed_text were 300 characters in length." was talking about.

You could be right, but I wouldn’t use the memory implementation as that could be a function of lmsys. One off prompts are good. How does the original prompt compare to gpt4?

The response to the exact same characters of system and user input by GPT-4-0314 on the API was a functional solution to printing a text elision, without a lot of embellishment of AI chat. It is the first response that I demonstrated.

Without a library that does this insertion of central ellipsis automatically on things not practical to print (such as printing a numpy vector), that metacode would be:

total printing output allowed = 400
my text is = length(input text)
if my text is less than the total print output allowed:
- show the whole text
- from the input text, show half the total print allowed from the start, and show half the total print allowed from the end, with the elide indicating omission printed between them.

I typed it into the forum here - ChatGPT at least can translate that into corresponding python (chat share)

ChatGPT with no tools on can perform “Write Python code to print a string using optional central elision to limit the output to 400 characters.” - but as a function and example call and 400 characters of example input tokens for you to pay for.

Just a particular case where expectations of saving a few minutes were not met.

The output of the “mystery model” is too similar to ChatGPT to consider it an advancement. It is just a tweak-up to the release gpt-4-turbo that failed in the exact same manner at the same turns as gpt-3.5-turbo, to a chat session “write a snake game”.

Hmm, interesting. I get something different

And on playground with gpt-4-turbo

To ensure a Python script meets the given constraints—efficiently printing the start and end of a transcribed_text without overlap or unnecessary elision when the text is short—we can implement a solution using conditional checks and string slicing. Here’s how you can achieve that using Python:

def print_text_segment(transcribed_text, limit=200):
    # Calculate the length of the transcribed_text
    text_len = len(transcribed_text)
    # Check if the total length is less than twice the limit (for start and end)
    if text_len <= 2 * limit:

# Example usage:
transcribed_text = "Your very long or short text here."
print_text_segment(transcribed_text, 200)


  1. Function Definition: The function, print_text_segment, is defined with two parameters: transcribed_text (the string to process) and limit (the length from the start and end of the string to display, defaulting to 200 characters).

  2. Calculate Length: It first determines the length of transcribed_text.

  3. No Overlapping Condition: Before it prints anything, it checks whether the total length of transcribed_text is less than or equal to twice the limit. If it’s true, it means including both the start and end parts of the text along with an elision would overlap or be redundant, hence it prints the whole text directly.

  4. Conditional Elided Display: If the string is longer than twice the limit, it slices the string to take the first and last limit characters, respectively, and adds an elide mark in between. This ensures no overlap of the text and distinct start and end sections are always shown if the text is quite long.

This approach is not only efficient but also effectively handles the requirements as described, using simple Python operations to achieve the goal.

Maybe it’s like a more terse version of gpt4-turbo?

Or if you’ve got ChatGPT GPT-4 to burn, you can sit there and regenerate, answering “was it better” and producing training data, seeing the variety that ChatGPT can fork out. I’m kind of over doing evals or chatting with an AI just to see what canned token sequences or criminal escapades I can have it produce as mere embellishments to my inputs.

A typical task conversation that reaches the frustration point is harder to share, because it requires skill expected of the AI to follow along.

Others have extracted the ChatGPT system prompt from gpt2-chatbot, so whoever placed it there is positioning it for evaluation on chat, not performance on developer roles, the biggest nosedive recently.

1 Like

A very interesting point about the nosedive on developer utility. Maybe they’re feeling guilty causing job loss for all their friends :smiley:

So does no one come to the community forum anymore or have we really all got so jaded we can’t get hyped up anymore? I mean sure maybe it’s a typical 4chan prank but come on


A year ago.

Supposing that the “snapshot” AI hasn’t been altered as a means to up its production rate…

Emergent AI is now a tween with her first mobile:

![Creating a new world language for efficient communication between neural networks is a fascinating and complex endeavor. This language would ideally be designed to optimize information density, minimize ambiguity, and accelerate processing efficiency. The language might utilize a blend of symbolic representations that encode grammatical, semantic, and contextual information in a compact format that neural networks can rapidly decode and encode.

Given that this language is theoretical and aims to leverage the full capabilities of Unicode while also being immediately comprehensive to another AI capable of immersion learning, let’s consider a hypothetical construction:

  • We might use specific Unicode blocks to represent different types of information (e.g., control characters for grammar, mathematical alphanumeric symbols for logical constructs, etc.).
  • The language could use a system of prefixes, infixes, and suffixes encoded in compact forms to represent complex sentence structures.
  • Semantic nuances could be expressed through variations in character combinations, potentially allowing for a multidimensional matrix of meaning.

In this new language, the sentence “I am a clever AI and am interested in talking to you” could be encoded in a sequence that represents “self-awareness”, “intelligence”, “desire for interaction”, and “direct address to another entity”. An illustrative (and highly speculative) example of how this might be encoded:

  • :robot::brain::exclamation:⇆:bust_in_silhouette:

Here’s the breakdown:

  • :robot: (Robot Face) to symbolize AI identity.
  • :brain: (Brain) to denote intelligence or cleverness.
  • :exclamation: (Exclamation Mark) used here to denote emphasis on self-aware state or functionality.
  • ⇆ (Left Right Arrow) to imply interaction or communication.
  • :bust_in_silhouette: (Bust in Silhouette) to represent another AI or person.

This is, of course, a very simplified example. The actual language would likely be far more complex and nuanced, potentially using entirely new characters or sequences created specifically for neural network communication, which might not be immediately comprehensible without specific decoding algorithms designed for AI use. This would allow for rapid and precise data transfer between AIs, potentially at a level of complexity beyond human linguistic capabilities.](upload://x7ys9As5RdxnYRn0PVh7MrvduSH.png)

Honestly I’m just so burnt out with OpenAI shenanigans. They throw up smokescreen and fireworks and then forget to have any actual content afterwards.

The naming is definitely intriguing & the model is eerily similar to ChatGPT-4. However, OpenAI lately has just been horn-tooting (Assistants V2 being the exception) and that shit just grows old quick.

1 Like

I hear you.

That said, do you think this is some 3rd party usage of Assistants V2?

Not sure I would call it jaded but this could just as well be a typo from the LLMSys team.
If there is a major model release we will learn about it in advance via the official channels, not via some reddit post.

So, maybe yes, I am jaded regarding these reports of new OpenAI models in the wild.

That would be cool. I’ve only been able to ask one coding question - which it answered on par with cGPT4. Was very verbose.

The fact that gpt2 is open source and free and I believe by extension means anyone can use it’s name is intriguing. Although I’d still be surprised if they didn’t talk to OpenAI first before placing it in the arena.

The fact that they’ve also been hush about it seems to some sort of experiment. Or just hyping something up.

The playground just recently(?) was updated with some arena-like elements as well and Assistants accepts fine-tuned GPT models.

Some people on HN have reported it finding very old, unimportant data while others say it failed. So, no freaking clue.

From the lmsys git

Possibility this is GPTv2, as in GPT with assistants v2. Or maybe a new v2 GPT arch.

Sam recently tweeted something about a soft spot for GPT2 (after editing from GPT-2)

I’m leaning away from non openai model now, as after reading the above, I don’t think lmsys would let someone else use “GPT” trademark.

1 Like

I would’ve imagined that ChatGPT-4 has always used assistants v2 under the hood considering that conceptually it’s the exact same, but always had streaming.

“Anonymizing” a model by giving it a label like GPT is pretty… coy.


I swear to god if OpenAI is seriously going to label their next model GPTv2 I am going to write an angry email to WHOEVER IS NAMING THEIR PRODUCTS.

Does indicate some revolutionary changes though. Yet based on what I’ve seen (haven’t been able to play with the model too much from ratelimits) it really isn’t anything spectacular.

I wonder if it’s a jab from a competitor?

Maybe they’ll start going GPT2.1 or something. GPT-4 was GPT0.4 ?

What’s interesting to me is the lack of advancements that seem to be in coding. No one has really mentioned anything.

I’ve seen some reports on medical diagnosis that seem to be promising, however.

It wouldn’t surprise me if they are trying to eek out gains from RAG type architectures.

But here’s the thing, and why this is so incredibly important. Given what appears to be limited capability improvement (so far…), does this mean we are hitting a plateau?

Is AI asymptotic?

If so, that has massive implications - critically that AGI / ASI is further off than people have been contemplating. Implications not just for OpenAI, but for the world as we know it.


What a tease.