One could directly predict the full sentence. Imagine the output layer as a vector of some activations, where each activation would represent a word in a language (basically, how it is now). But instead the activations being the probabilities of the next word, it would be easily 1, if the word is present in the sentence, and 0 if not. As a next step, I could have another transformer or neural net, which would just order these words into the right order so that the sentence makes sense. I think this approach would fasten the answering by a lot. I also supposes that this approach is not covered by non-autoregressive models since it would still keep the autoregressive part.
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How Does GPT-4 Predict Words So Quickly? | 0 | 87 | August 1, 2024 | |
Autoregressive Fine-Tuning for Chat Models | 0 | 127 | July 10, 2024 | |
Multi-step chatgpt result convert into openai calls? | 0 | 66 | October 29, 2024 | |
How to let chatgpt fully digest a really large text? | 7 | 8239 | December 16, 2023 | |
How to create follow up question in chat? | 2 | 4410 | February 21, 2024 |