ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning (Paper Explained)

jackcole · December 1, 2021, 4:14pm

Here is a good video explanation of the ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning paper. My hypothesis is that including as many natural language types of tasks as possible helps the model to develop a more generalized way to access the information stored in the model. I think the number of different concepts introduced by variations in task might end up being the key variable for getting more generalization. The Google FLAN paper made some steps in this direction, but their prompt variation did not introduce that many new concepts. For example, many of the instructional templates appear to vary over the location of the task components (before the prompt, after the prompt, or premise before and hypothesis and answers after). It seems like there are only two overarching concepts in the templates for most data set: what and where. 1) What is the task to be done (typically only one of these per dataset)? (e.g., comparison, progression, conclusion, factual knowledge, arithmetic operations, summarization). 2) Where are the components to get the information for the task? (before, interspersed, after).

In other words, I think you could get even better results by training the language model simultaneously with as many specific natural language tasks as possible, but also include programmatic prompt variation like FLAN (with as many conceptual variations of prompt as possible). I have developed an initial list of instructional templates that introduce additional concepts that can be used to vary prompting.

1). binary choice (also with synonym variations for answers-T,F,True,False,Yes,No,Y,N,YES,NO)
2). multiple choice (also with variations for referencing-A,B,C,D, correct word or sentence identified from a list)
3). multiple choice with more than one correct answer or no correct answers
4). synthesize answer from scrambled answer (correct sentence/word scrambled in various ways: spaces removed, split and rearranged)
5). errors introduced (typos, wrong spacing, wrong punctuation, wrong word used-“We went over to they’re house.”, wrong tense, wrong plurality/singularity)
6). order of information to be compared
7). synonym substitutions
8). string operations (split into parts, split into phonemes, split out prefix/suffix)

This could be a useful start to a research project on developing even more versatile and accurate models that respond in the way we usually desire.

Topic		Replies	Views
[Paper] Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding Prompting research	0	2405	January 28, 2024
Arvin Ash: "How Does ChatGPT Actually Work? Behind the Scenes" - Introduction by ChatGPT Community	1	2962	December 20, 2023
Build Talk: State of GPT - Andrej Karpathy Community gpt-4	3	4928	December 20, 2023
Five rules for finetuning from my experience, observations, and consulting Documentation	10	5208	September 5, 2023
Having trouble in advanced multimodal reasoning beyond the surface Prompting api , multimodal , gpt-4o	1	169	November 10, 2024

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning (Paper Explained)

Related topics