Customer service assistant inferred from screenshots

Hi, we took a few screenshots from our web application, added some brief descriptive text about what it does, fed it to regular ChatGPT 4+ and asked it to analyze the screenshots. Then we asked it a few user questions of the “How do I do this?” variety. It did extremely well!

Would this process scale, if we took 50-100 screenshots, stuffed them into files and fed them as the initial file list for an Assistant? The 20 file limit, 512MB per file is huge - I imagine we could fit everything we needed in a single PDF file of 5-10 MB. The user conversations would be short, we are not looking for it to solve complex questions.

Any feedback on the feasibility of this would be appreciated and whether the “stuffing all screenshots and descriptive text into one PDF” is the best way to feed a file to the Assistant.

Thank you.

Edits - Work Plan For Customer Service Assistant
I will update this section as new information learned.
Open issues/notes for myself in italics below

Create Training Data

  • Generate screenshots of web application
  • Host screenshots on a web site that ChatGPT can access (Use GUID parameter in URL for a minimal security layer?)
  • Create a single markdown file, in which each screenshot’s link is provided followed by a descriptive annotation for that image.
  • Alternately, use API which shows how to send an image URL and associated “what does this image contain?” text.
  • The total text will be far less than the million words mentioned by Jay F. But how are screenshot images counted in this mix (words/tokens)? (Answer: cost info provided in vision URL above.)

Create Assistant

  • Feed the single markdown file to ChatGPT via API
  • ChatGPT will read the text of the markdown file, and will also fetch and analyze each linked screenshot, adding it to its knowledge base (Is this correct? Or do the images need to be fed separately, or as a folder - but then how to tie each image to its description?)
  • How to update the Assistant, do we update the main file and screenshots and make a new one, or do we keep incrementally giving the same Assistant new information and corrections?

Chatbot Integration & Development

  • Create Chatbot that will, for each user, open a new Thread to the Assistant and send Messages to it for that user (Find open source chatbot code to do this)
  • This will use private API key so that others cannot use this Assistant (Right?)
  • This will prevent one user from seeing another user’s chats (Ideally)
  • Can the bot refer the user back to the images it used in its answer?
  • Do we log all messages to review if it’s doing a good job?
  • Does each new user Thread start with a “fresh clone” of the trained Assistant, or does each new user Thread continue the same assistant - raising risk of memory loss after some time?

Cost issues

  • Should each user/customer’s interactions be capped?
  • How to prevent the user from just chatting with the support bot about life? (Limit the number of messages in a Thread?)
  • Or will all messages fail once our company’s monthly cap is reached?

PDF is not an AI-friendly format. It just wastes effort.

The 512MB per file is not realized, it caps out around 1 million words.

I would just open up notepad++ beside your ChatBot, and start pasting the text. Advanced level: include the URL of associated images you’ve put on the web and make them linked with annotations for viewing. You can format the plain text file in the markdown that you receive from a copy button in ChatGPT Plus.

gpt-4-vision-preview AI model is also available on API. You could just batch a directory of images for description, prompt the vision AI with long app description of what is already documented, and compile your text file automatically (although I would review everything to see where the AI computer vision lost the plan).

Wow, that was an extremely useful reply: what NOT to do, what TO do, and tips on HOW to do it. Thank you! In particular, I didn’t realize that pasting ChatGPT output now results in markdown.

I’ve decided to update my original post with a draft work plan, starting with your tips, and keep editing it as I learn more either from more replies or on my own. If you happen to read and feel like adding suggestions in this thread, they would be greatly welcomed.

Interesting idea!
We are currently working on simillar project, just instead of screenshot we working with users manuals for devices.

Some tips for cost issues:

  • you can limit the numbers of questions per user (and refresh after 24hrs)
  • you can limit the topics that the assistant can talk about only those that are contained in his datasets
1 Like