Word currently has a feature to “Rewrite Selection” through right-clicking a sentence. The average result is, “no suggestions.” This seemed to be an ideal application of GPT-3’s skills, so I decided to write an Add-In for Word, to see if this could be done better. I fine-tuned a Curie model on a subset of data taken from: TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages. This dataset is licensed under Creative Commons Attribution 2.0 Generic. A custom program was written to generate the data in a form that the API could accept. Sentence pairs were selected that had fewer than 70% of the words in the sentences in common. The reason I chose to apply this filter is that the dataset contained a substantial amount of what I would call inconsequential rephrasings (like small changes in verb tense or moving words around). I ended up with 66400 sentence pairs for the fine tuning. The Add-In was coded in Visual Studio 2019.
Along the way, Codex was helpful with some coding challenges. It was used to generate Python code to properly escape the JSON strings. The Javascript Playground was also used to generate some functions such as counting the number of words in a string and changing button text. It is not that hard to look up things like this, but I found Codex faster for many problems. It is a little bit of a new way of thinking about coding, but I find it to be quite natural.
Documenting progress in Jupyter Notebook was helpful. It was also helpful for tuning the parameters passed to the API to find the right mix. Ultimately, I think the result was pretty good. It is much better than Word’s built in function. I plan to develop a set of author tools like this, which can help people improve their writing results.
Thanks. I did use it at the last step. The Python code to escape the JSON strings was necessary because the CLI crashed when trying to read the file. The original data file had extra fields and information that had to be stripped out. The sentences were also not in pairs, but were in individual rows with a reference number. Some of the sentences had multiple alternative phrasings.
I used some freemium browser extension (I don’t remember the name now) in the past for rephrasing content for some website copywriting. This would be a great utility to do it all in MS Word.
I’m debating about whether trying to monetize it or open source it. It seems like the GPT3 API fees may be too high and/or unpredictable with monetization. Imagine someone who pays a fixed monthly fee of say $5, but uses $100 worth of tokens through the API.
The rephrase Add In works quite well. I have some ideas about how to make it even better. There are times when it can generate phrases that don’t have the same meaning, but I think that can be fixed (or greatly reduced).
That could work. I’m also considering fine-tuning one of the less expensive models. Currently, I am working with a fine-tuned Curie model, but maybe one of the others will work fine with even greater fine-tuning.
I will share it at some point. I am working on an enhanced model that does additional kinds of rephrasing. 90% of the work is properly preparing the training data.