Best way to create responses that exceed token length

If you haven’t already done so, I would recommend taking an iterative approach to summarizing the larger content. Use an NLP library (Spacy, Gensim, or NLTK) then break your story down into paragraphs. Once in paragraphs, separate them by sentences. Use some summarization method on the paragraph itself, then on each sentence individually. Once you have a summarization of the sentences, compare these to the paragraph summarization. Did you like the result? If so, put the paragraph summarization and the sentence(s) summarization into yet another summarization. Did this maintain your data’s integrity? If not, another approach would be to replace keywords with a relatively close synonym. If you use Word2Vec for comparing synonyms, you could technically say the same thing but using different words. I realize this was a ton of suggestions. Feel free to use one of them to assist you in summarizing your data. Cheers!

3 Likes