Filler words prompt recommendations

Hey!

I’m currently exploring different prompts for text editing and I’m wondering if any of you have recommendations for a prompt that can remove filler words from text without summarizing it.

I’m specifically looking for a prompt that can identify and remove words such as “um”, “uh”, “like”, “you know”, and other similar phrases that don’t add much value to the text but can be distracting or detract from its overall quality. Everything I’ve tried so far (few shot prompting, GPT4 “personas”, and a lot of other prompts) have all yielded a lot of summarisation from the AI. I want to preserve all the context, but just slightly tweak the input to read a little more formally.

The input will be sections of interview transcripts (ie questions asked and the answer provided).

2 Likes

What you’re describing are stop words.

Does it have to be done through ChatGPT, or can you use something else?
Here’s an example using Python.

Otherwise, your only option (off the top of my head) would be giving it a list, or even making it a plugin.

2 Likes

Here’s a working example

3 Likes

@sps @anon10827405 – thank you! It’s a little more than just stop words though, it’s input that looks a bit more like this:

“They don’t, for instance, some things, some of their online exercises don’t work when they try to open them.”

I’d like to have this rewritten as “Some of their online exercises don’t work” (optionally “when they try to open them”).

More broadly it’s real dialogue (as it comes from interview transcripts), so what we want to remove isn’t perhaps as simple / straightforward as an array of filler words, that’s why I was thinking to lean on GPT. Is there a good way to do this directly in code?

Additionally, I’ve noticed that issues arise usually / mostly when we try and clean multiple question / responses at the same time. If we do them one-off, it works fairly well, but if we do n > 3, it starts summarizing a lot. It’s, however, unwieldy and inefficient to do the cleaning one off :frowning:

2 Likes

You can try something like this.

1. Break the text into sentences.
2. For each sentence do this: describe and experiment with what you want done.
Once you're done review your work and check that the above instructions have been followed correctly.

I’m not going too much into the details of what you need done in step 2, because you’ll need to test with your data and see what works best.

3 Likes

Try a one-shot or two-shot… ie give it a couple examples and it’ll do better, I bet.

Good luck.

1 Like

This is achievable. It required some iteration for the sentence you gave.

If you want to “batch”, you’ll have to append the system message at the end rather that the beginning.

You’ll have to take time to refine it according to your data and desired output. Alternatively you can spend some time in creating a training dataset with prompts and desired completions and fine-tune a base model.

2 Likes

Thank you @sps, I’ll try that out :slight_smile:

2 Likes

You can give this prompt a try:

I don’t use the “system” role (ever) but you could also try it with the system role.

Mr Sps can you please help me I do have a few questions to ask

Welcome @jhall0947

I recommend creating a separate topic on the forum.

1 Like