Will GPT-3 understand data better with HTML tags?

I’m wondering if GPT-3 can use basic HTML tags to better understand the content.
The idea is to feed GPT-3 with chunks of text from a webpage.

The use case is the send webpage content via prompt and ask GPT-3 some questions about it.

Option 1: get pure text out of the webpage
Option 2: leave basic HTML tags like <h1>, <h2>, <p>, etc., and remove token-consuming chars like CSS and non-text tags like empty div or sections.

I was thinking maybe GPT-3 can use these tags to better understand the structure like titles and subtitles.

What are your thoughts?

1 Like

I am currently using Davinci 2 for extracting information (used for content summarization and categorisation) from full raw website pages.

The results are excellent with or without HTML.

Example (prompt in bold):

<“homepage content”>

Now we read the information on the homepage and list all of the categories and tags for business category, business facilities, business features, equipment, security, parking, opening times, and classes etc at this location, as a csv:

Business category: Gym
Business facilities: Cardio equipment, weights, group classes, personal training
Business features: 24/7 access, security, hygiene standards
Equipment: Cardio equipment, weights
Security: 24-hour security, secure key access
Parking: Yes
Opening times: 24/7
Classes: Group classes

1 Like