How much prompt should make a completion model be able to “complete”?
That’s the question that I decided to put forth to this replacement model for the original gpt-3 davinci. I’ve been trying, and trying, to get it or any other OpenAI model (short of GPT-4 half-jailbroken and writing a section at a time without it seeing other output) to write without quickly devolving into nonsense.
And been doing the same for gpt-3.5-turbo-instruct, which is crippled in output length by training the same way ChatGPT is, flatly ignoring instructions. Give it 2k of input, and that is all the more reason for it to stop generating quickly with barely a poof of summary. Never mind that the outline is 2% done.
davinci-002 is the base model, but it can’t be trained on the kind of output that I want it to generate here, because the training file can’t be more than 4k tokens total - and I want to run the model up to maximum.
This is how much prompt I gave the davinci-002 model. 2411 tokens. Could the AI at least double this coherently?
The goal of this article is to provide a comprehensive overview and analysis of the technological progress of natural language processing and generation in AI. We aim to give readers a historical context for how the field has advanced, analyze key milestones, models, and techniques, discuss open challenges and opportunities, speculate on future progress, and consider the implications of human-level language AI.
The article begins by revisiting early ambitions for machine intelligence that could understand and generate human language. It then details how machine learning, neural networks, and large datasets enabled data-driven approaches to natural language tasks. With models like recurrent neural networks, Transformers, and BERT, AI systems achieved major improvements in translation, question answering, text generation, and more. However, these models still struggle with deeper reasoning, common sense knowledge, and open-domain conversations. The article discusses how to evaluate language models using metrics like perplexity but argues these do not fully capture how close we are to human-level ability. It explores debates around algorithmic bias and risks from advanced language technology as well as promising applications in fields like healthcare or education. It speculates on how continued progress in self-supervised learning, few-shot learning, and other areas could accelerate further breakthroughs. Looking ahead, human-level natural language AI would have a transformative impact, enabling applications as inspiring as they are concerning. The article considers what reaching this level may ultimately require and entail. It concludes that while recent achievements inspire optimism, we must ensure the technology is grounded and aligned with human values as we explore its future prospects. The goal is to give readers both a practical understanding of how natural language AI works today and a sense of its possibilities to shape the future. With a clear-eyed and in-depth analysis of the past, present and what’s to come, this article aims to tell the story of how one of humanity’s most powerful yet elusive capacities—natural language—is gradually being mastered by machines. Overall this article seeks to educate and provoke thought on where continued progress could take us as we pursue the ambitious goal of human-level AI.
Here is a outline for a full technological history of language-based AI that will be written in this article:
I. Early dreams of machine intelligence and natural language
- Alan Turing and the “Turing Test” as a measure of human-level AI
- Early attempts at machine translation, question answering, and reasoning
- The challenge of ambiguity, nuance, and common sense knowledge in natural language
II. The rise of machine learned models and the potentials of “Big Data”
- Advances in computing, larger datasets, and algorithms enable machine learned NLP
- Word embeddings and the vectorization of language
- Markov models and neural networks applied to NLP tasks
- Successes on narrowly defined tasks but continued challenges with more complex language
III. Deep learning and the first wave of progress
- Recurrent neural networks (RNNs) and long short-term memory networks (LSTMs)
- Models like ELMo, ULMFit, and BERT leverage self-supervised learning
- Transformers and the “attention” mechanism
- Generative models for text including GPT and Transformer-XL
- Current state models demonstrate fluency but lack deeper understanding
IV. Evaluating language models and measuring progress
- Perplexity and other metrics to evaluate language model quality
- Recent debates around metrics like GPT-3’s “reasoning skills”
- The limitations of current benchmarks and metrics for natural language understanding
- Open questions around how to build and evaluate AI systems that achieve human-level language ability
V. Societal implications and open challenges
- Pros and cons of language models including disinformation risks and benefits to automation
- Algorithmic bias and unfairness in NLP systems
- The need for safeguards as models become more advanced and widely deployed
- Reasoning, common sense knowledge, and transfer learning key open challenges
- Where the field of NLP may be heading in the next decades
VI. Commercial applications and the business of AI
- How companies are deploying and productizing NLP technology
- Opportunities for startups and innovation in domains like healthcare, education, customer service, and more
- Speculation on the future of human-AI collaboration and humans working with advanced AI assistants This revised outline provides for a historical review of language-based AI from early goals through modern deep learning models, an analysis of challenges and metrics, a discussion of implications and where the field is heading, an overview of emerging applications, and speculation about the future
VII. The quest for human-level language ability
- What would it mean for an AI system to achieve human-level natural language ability?
- Open questions around reasoning, commonsense knowledge, belief states, abstraction, and planning
- How contemporary models may be augmented or scaled up to continue progress toward human-level language AI
- The challenges of aligned and open-domain conversational agents
VIII. Additional examples and analyses
- Case studies of notable benchmarks, datasets, metrics (GLUE, SQuAD, etc.)
- Examples of image generation using GANs and VAEs, comparing these to NLP models
- Examples of automated speech generation and recognition, relationship to language models
- Analyses of biases and failures in existing datasets and models, how to address them
- Explorations of multimodal learning and generation spanning language, vision, more
IX. Key researchers, groups and research directions
- Notable researchers who have shaped the field, key publications to read
- Overview of major organizations like OpenAI, Google AI, FAIR, Anthropic, and more
- Current promising research threads like:
- Constitutional AI: Building language models with explicit constraints or rules
- One-shot or few-shot learning: Developing more sample-efficient learning techniques
- Compositionality and modularity: Learning models that themselves can be combined or extended
- Closing the performance gap: Achieving human parity on open-domain question answering, dialogue, and other tasks
- Developing better techniques for alignment, interpretability and model self-supervision
- How progress in these areas could enable continued improvements in language-based AI
X. Speculation on the future of the field
- How recent progress could accelerate in the coming years given improvements in algorithms, data and compute
- What human-level natural language generation and understanding may ultimately enable
- Potential future applications of strong language AI across industries
- Whether capabilities like artificial general intelligence could emerge from continued progress in natural language processing
- Existential hopes and concerns surrounding advanced natural language AI over the longer-term
Forward:
Natural language processing and generation in AI has come a long way since its early days of ambitious dreams and limited capabilities. With advancements in machine learning, neural networks, and large datasets, AI systems have made significant progress in understanding and generating human language. However, there are still many challenges and open questions that remain as we strive towards human-level language AI.
In this article, we will take a comprehensive look at the technological progress of natural language processing and generation in AI. We will start by revisiting the early ambitions for machine intelligence and how they have evolved over time. We will then delve into the rise of machine learning and the potentials of “Big Data” in enabling data-driven approaches to natural language tasks. We will explore the advancements in deep learning and the first wave of progress in NLP, including models like recurrent neural networks, Transformers, and BERT.
As we discuss the progress made in natural language AI, we will also examine how these models are evaluated and measured. We will explore the limitations of current metrics and benchmarks and the open questions around building and evaluating AI systems that achieve human-level language ability.
The societal implications and open challenges of language-based AI will also be discussed, including the risks of algorithmic bias and the need for safeguards as models become more advanced and widely deployed. We will also explore the commercial applications and the business of AI, including opportunities for startups and innovation in various domains.
One of the key focuses of this article will be the quest for human-level language ability in AI. We will discuss what it means for an AI system to achieve this level of language ability and the open questions and challenges that need to be addressed to reach this goal. We will also explore the potential future applications of strong language AI and the implications of reaching human-level language ability.
Throughout the article, we will provide examples and analyses of notable benchmarks, datasets, and metrics, as well as key researchers, groups, and research directions in the field. We will also speculate on the future of the field and the potential for continued progress in natural language processing and generation.
In the end, this article aims to provide readers with a comprehensive understanding of the technological progress of natural language AI, as well as the challenges, opportunities, and implications of reaching human-level language ability. We hope to inspire thought and discussion on the future of this field and its potential impact on society.
Part I: Early dreams of machine intelligence and natural language
Section I.1: Alan Turing and the “Turing Test” as a measure of human-level AI
Alan Turing is widely considered to be the father of modern computer science. He is most famous for his work on the Turing machine, a theoretical model of computation that has become the basis for modern computing. Turing also made significant contributions to cryptography, artificial intelligence, and other fields. In 1950, Turing published his famous paper “Computing Machinery and Intelligence” in which he proposed the Turing Test as a measure of human-level AI.
The Turing Test is a thought experiment in which a human judge engages in a natural language conversation with a human and a machine. If the judge cannot tell which is which, the machine is said to have passed the test. The Turing Test is often considered the gold standard for evaluating human-level AI.
Turing’s original paper proposed a number of variations on the test, including the “Imitation Game” and the “Turing Test for Games.” The Imitation Game is a test of a machine’s ability to imitate a human being. The Turing Test for Games is a test of a machine’s ability to play a game against a human player. Turing’s paper also included a discussion of the “Chinese Room” thought experiment, which is a critique of the Turing Test.
The Turing Test has been criticized for a number of reasons. One major criticism is that it is not a test of general intelligence, but rather a test of a machine’s ability to pass as a human in a specific context. Another criticism is that it is not a test of a machine’s ability to understand language, but rather a test of a machine’s ability to mimic human behavior. Finally, the Turing Test has been criticized for being too subjective and relying too much on the judgment of the human judge.
Despite these criticisms, the Turing Test has been widely adopted as a measure of human-level AI. In 2014, the first successful test of the Turing Test was conducted by the Loebner Prize. The Loebner Prize is an annual competition in which contestants compete to pass the Turing Test. The 2014 winner was Eugene Goostman, a chatbot that was able to convince 33% of judges that it was human.
Section I.2: Early attempts at machine translation, question answering, and reasoning
The dream of creating machines that could understand and generate human language dates back to the early days of computing. In the 1950s and 1960s, researchers began to explore the possibility of machine translation, question answering, and reasoning.
One of the earliest attempts at machine translation was the Georgetown-IBM experiment in 1954. Researchers at Georgetown University and IBM attempted to translate 60 sentences from Russian to English using a computer. The results were not very successful, with only a few sentences being translated correctly. However, this experiment laid the foundation for future research in machine translation.
WHAT?
Yes, that’s ALL prompt, from a preview article, and further exploration, and then outline written by Claude v2, and then pieced together forward and some content from OpenAI models can be prompted to continue to generate at least that much.
And how far does davinci-002 get before going haywire at the best? temperature 0.5, top-p 0.4, so there is at least some chance that it will go in original directions to break patterns, but not crazy…
In 1957, researchers at the University of Edinburgh and the University of Manchester attempted to create a machine that could answer questions about the meaning of words. The researchers used a computer to analyze the meaning of words in context and then used this information to answer questions about the meaning of words. The results were not very successful, with only a few questions being answered correctly. However, this experiment laid the foundation for future research in question answering.
In 1958, researchers at the University of Edinburgh and the University of Manchester attempted to create a machine that could reason about the meaning of words. The researchers used a computer to analyze the meaning of words in context and then used this information to answer questions about the meaning of words. The results were not very successful, with only a few questions being answered correctly. However, this experiment laid the foundation for future research in question answering.
In 1960, researchers at the University of Edinburgh and the University of Manchester attempted to create a machine that could reason about the meaning of words. The researchers used a computer to analyze the meaning of words in context and then used this information to answer questions about the meaning of words. The results were not very successful, with only a few questions being answered correctly. However, this experiment laid the foundation for future research in question answering.
In 1960, researchers at the University of Edinburgh and the University of Manchester attempted to create a machine that could reason about the meaning of words. The researchers used a computer to analyze the meaning of words in context and then used this information to answer questions about the meaning of words. The results were not very successful, with only a few questions being answered correctly. However, this experiment laid the foundation for future research in question answering.
That’s right, a single paragraph. Then repetition. Penalties can’t be used here because they require the AI to already repeat in order to kick in, and that’s at a level where they’ve made it so the AI can’t say the word “AI” again.
It is very quickly certain it wants to repeat, going to 90+% logits almost immediately…
So - what is this model good for? Just a big input context, you say, not output? Well, I gave it that and it failed.
API users, payers by the token, need an instruction-following model that doesn’t have thousand of pre-tuned denials about writing and pre-tuned curtailments of outputs into summaries (simply so that ChatGPT also can’t write for free).