Understanding how Models recognize Roman Numerals and Detect Questions

Hello,

I’m new to prompting and have a couple of questions regarding LLMs (Large Language Models) that I’m curious about. I’m hoping someone can provide some insights.

  1. How do LLM models like GPT-3 understand and recognize Roman numeral numbers? I’ve noticed that when given an example like “I, II, II,” the model automatically completes the sequence. How does this happen?

  2. When I provide a prompt containing a question, it appears that the model can recognize that I’m asking a question. Could someone explain how this works? Is there a specific way the model detects questions within the text?

I’d like to know how LLM works and I would greatly appreciate hearing from anyone who can shed some light on these questions.

Thank you for your time.

Why does ChatGPT answer questions even when there are none?

The answer: Fine-tuning and human evaluation reinforcement learning that is fed into the model, priming it not to commiserate about your day, but to answer the perceived question.

OpenAI has examples of doing these different tasks that shape the generation of output, numbering in the millions. Keep pressing thumbs up.

The result is as you see, a completion AI that has been transformed from just using its language knowledge to continue writing where you left off into a task-performing, instruction-following…and question-answering chatbot.

Untitled

It is the vast number of examples that let the AI know that after “the monkey at a”, the next likely word is “banana”. However, it is the fine-tuning that lets it know that after user input the AI answers a question.

AI also has ingested and been pretrained on tons of books, documents, papers, a good fraction of human knowledge, so that it not only follows the form of an outline where sections are numbered in roman numerals, but can even answer about them.

You can even ask AI to do some forum poster’s GRE test (with fat fingers).

Untitled

2 Likes

Hi and welcome to the Developer Forum!

“How” is an interesting question, as I understand the current theory, when you first start to train a model on word data, the first things that models start to detect is things like there are groups of letters that occur more often than others, then you get there are certain letters that are far more prevalent than others, some seem to mark boundaries, some separate groups of letters, these are of course things like spaces and punctuation and common letters in alphabets.

So the model encodes rules that describe and encapsulate basic letter and grammar rules, next comes the discovery of higher level rules like specific patterns of letters are found more often than others, these are words. Next comes the observation that some words are found next to other words more often than not, then the algorithm starts to detect certain rules around which of those groups of “words” goes with others in larger and larger groups, now we have sentences and paragraphs.

Hopefully you get the idea that ever increasing levels of complexity become encoded and “understood”. At some point in the training, things like basic logic and math are found to be encoded in the information we all put online, this is the “emergent” behaviour that was/is so hotly debated in some circles.

So to answer your question, essentially the rules for what a roman numeral IS became encoded at some point in the training and the same for what a question IS. In order to correctly guess what the next word in a sentence will be… you have to understand how the world those words describe works.

1 Like