Html in text uploaded via files api

ddrechsler · March 6, 2022, 11:58pm

if the content of the file I upload to help train openAI has HTML embedded in it (ie span, div etc) is that going to confuse the system? SHould I be stripping all the html out first before i provide it?

I’m feeding openAI the contents of our website essentially to get it to answer questions about it, but not sure whether I should just grab the text only (as the API to my CMS returns json that has all the html included,
Using the example in the doco:
{“text”: “puppy A is happy”, “metadata”: “emotional state of puppy A”}
{“text”: “puppy B is sad”, “metadata”: “emotional state of puppy B”}

would this still work?
{“text”: “puppy A < span>is happy< /span”, “metadata”: “emotional state of puppy A”}
{“text”: “puppy < span>B is sad< /span>”, “metadata”: “emotional state of puppy B”}

or would the span tags just confuse the hell out of it? (spaces added to the span tags as they weren’t showing up)

sps · March 7, 2022, 10:34am

Hi @ddrechsler ,

This is a very interesting problem that you’re facing. In my opinion if the purpose is to help find Information listed on your website, you wouldn’t be needing the HTML. However of the purpose is to help user navigate to do something, then that’s going to be a whole new thing.

Also while we are on subject, a better way to help users find information would be something like Google custom search, because it will be able to handle any updates to the website while in case of GPT-3 you’d have to fine-tune after every update to accommodate changes.

ddrechsler · May 4, 2022, 12:51am

in the end I through the html to some python code that strips out just the text from the html then threw that to open AI. Working well and I have a chatbot that is able to answer questions directly from the text of our website. I have to whitelist questions it can successfully answer though, but you can use the knowledgebases in Dialogflow for that purpose. It’s kinds cool, like the ultimate website search where it answers from the content of pages.

Topic		Replies	Views
Query on how ChatGPT parses HTML Inputs API gpt-4 , chatgpt , api	0	346	May 31, 2024
How to prepare data for AI Assistant? GPT builders	3	642	July 5, 2024
Will GPT-3 understand data better with HTML tags? Prompting	1	2241	November 18, 2022
How to prepare the content of HTML page for embeddings calculation API api	2	1193	June 6, 2024
Using actions/tools to browse the internet API gpt-4 , api	2	342	May 24, 2024

Html in text uploaded via files api

Related topics