I’m currently exploring different prompts for text editing and I’m wondering if any of you have recommendations for a prompt that can remove filler words from text without summarizing it.
I’m specifically looking for a prompt that can identify and remove words such as “um”, “uh”, “like”, “you know”, and other similar phrases that don’t add much value to the text but can be distracting or detract from its overall quality. Everything I’ve tried so far (few shot prompting, GPT4 “personas”, and a lot of other prompts) have all yielded a lot of summarisation from the AI. I want to preserve all the context, but just slightly tweak the input to read a little more formally.
The input will be sections of interview transcripts (ie questions asked and the answer provided).
@sps@anon10827405 – thank you! It’s a little more than just stop words though, it’s input that looks a bit more like this:
“They don’t, for instance, some things, some of their online exercises don’t work when they try to open them.”
I’d like to have this rewritten as “Some of their online exercises don’t work” (optionally “when they try to open them”).
More broadly it’s real dialogue (as it comes from interview transcripts), so what we want to remove isn’t perhaps as simple / straightforward as an array of filler words, that’s why I was thinking to lean on GPT. Is there a good way to do this directly in code?
Additionally, I’ve noticed that issues arise usually / mostly when we try and clean multiple question / responses at the same time. If we do them one-off, it works fairly well, but if we do n > 3, it starts summarizing a lot. It’s, however, unwieldy and inefficient to do the cleaning one off
1. Break the text into sentences.
2. For each sentence do this: describe and experiment with what you want done.
Once you're done review your work and check that the above instructions have been followed correctly.
I’m not going too much into the details of what you need done in step 2, because you’ll need to test with your data and see what works best.
You’ll have to take time to refine it according to your data and desired output. Alternatively you can spend some time in creating a training dataset with prompts and desired completions and fine-tune a base model.