Here is a good video explanation of the ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning paper. My hypothesis is that including as many natural language types of tasks as possible helps the model to develop a more generalized way to access the information stored in the model. I think the number of different concepts introduced by variations in task might end up being the key variable for getting more generalization. The Google FLAN paper made some steps in this direction, but their prompt variation did not introduce that many new concepts. For example, many of the instructional templates appear to vary over the location of the task components (before the prompt, after the prompt, or premise before and hypothesis and answers after). It seems like there are only two overarching concepts in the templates for most data set: what and where. 1) What is the task to be done (typically only one of these per dataset)? (e.g., comparison, progression, conclusion, factual knowledge, arithmetic operations, summarization). 2) Where are the components to get the information for the task? (before, interspersed, after).
In other words, I think you could get even better results by training the language model simultaneously with as many specific natural language tasks as possible, but also include programmatic prompt variation like FLAN (with as many conceptual variations of prompt as possible). I have developed an initial list of instructional templates that introduce additional concepts that can be used to vary prompting.
1). binary choice (also with synonym variations for answers-T,F,True,False,Yes,No,Y,N,YES,NO)
2). multiple choice (also with variations for referencing-A,B,C,D, correct word or sentence identified from a list)
3). multiple choice with more than one correct answer or no correct answers
4). synthesize answer from scrambled answer (correct sentence/word scrambled in various ways: spaces removed, split and rearranged)
5). errors introduced (typos, wrong spacing, wrong punctuation, wrong word used-“We went over to they’re house.”, wrong tense, wrong plurality/singularity)
6). order of information to be compared
7). synonym substitutions
8). string operations (split into parts, split into phonemes, split out prefix/suffix)
This could be a useful start to a research project on developing even more versatile and accurate models that respond in the way we usually desire.