MS Word Add-In - Rephrasing

This project is open source. Requires Visual Studio. A Jupyter Notebook is available in the GitHub project.

Word Add-In Example of Sentence Rephrasing:


Word currently has a feature to “Rewrite Selection” through right-clicking a sentence. The average result is, “no suggestions.” This seemed to be an ideal application of GPT-3’s skills, so I decided to write an Add-In for Word, to see if this could be done better. I fine-tuned a Curie model on a subset of data taken from: TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages. This dataset is licensed under Creative Commons Attribution 2.0 Generic. A custom program was written to generate the data in a form that the API could accept. Sentence pairs were selected that had fewer than 70% of the words in the sentences in common. The reason I chose to apply this filter is that the dataset contained a substantial amount of what I would call inconsequential rephrasings (like small changes in verb tense or moving words around). I ended up with 66400 sentence pairs for the fine tuning. The Add-In was coded in Visual Studio 2019.

Along the way, Codex was helpful with some coding challenges. It was used to generate Python code to properly escape the JSON strings. The Javascript Playground was also used to generate some functions such as counting the number of words in a string and changing button text. It is not that hard to look up things like this, but I found Codex faster for many problems. It is a little bit of a new way of thinking about coding, but I find it to be quite natural.

Documenting progress in Jupyter Notebook was helpful. It was also helpful for tuning the parameters passed to the API to find the right mix. Ultimately, I think the result was pretty good. It is much better than Word’s built in function. I plan to develop a set of author tools like this, which can help people improve their writing results.

7 Likes

Wonderful. Great job.

1 Like

Very cool use-case! I’m curious, couldn’t you use the CLI data preparation tool to fine-tune your model?

1 Like

Thanks. I did use it at the last step. The Python code to escape the JSON strings was necessary because the CLI crashed when trying to read the file. The original data file had extra fields and information that had to be stripped out. The sentences were also not in pairs, but were in individual rows with a reference number. Some of the sentences had multiple alternative phrasings.

1 Like

Perfect!

I used some freemium browser extension (I don’t remember the name now) in the past for rephrasing content for some website copywriting. This would be a great utility to do it all in MS Word.

When are you going to release it?

Can participate in beta testing.

1 Like

I’m debating about whether trying to monetize it or open source it. It seems like the GPT3 API fees may be too high and/or unpredictable with monetization. Imagine someone who pays a fixed monthly fee of say $5, but uses $100 worth of tokens through the API.

The rephrase Add In works quite well. I have some ideas about how to make it even better. There are times when it can generate phrases that don’t have the same meaning, but I think that can be fixed (or greatly reduced).

For monetization, yes this is a genuine problem. I would do something like this:

  • Onetime fee to install the plugin
  • Prepaid credits to use the API (from which have a percentage I save for Services)

That could work. I’m also considering fine-tuning one of the less expensive models. Currently, I am working with a fine-tuned Curie model, but maybe one of the others will work fine with even greater fine-tuning.

Interesting, could it be possible you could share the jupyter notebook for the process.

1 Like

I will share it at some point. I am working on an enhanced model that does additional kinds of rephrasing. 90% of the work is properly preparing the training data.

I have open-sourced the project. The Jupyter Notebook is in the project.

2 Likes

This project has been open-sourced. Get the code for Visual Studio on GitHub project.

1 Like

I have open-sourced the project. Check the GitHub link at the top of the original post.

That’s Brilliant! :rocket:

I’m going to try it as soon as I get free.

Thank you for the share :+1:

1 Like