Web scraping project with ai

I’m currently working on a project where I need to download a large number of PDFs from different websites. These PDFs are technical sheets for various investment funds.

The PDFs are uploaded monthly, so I need a way to download the file for a specific month and year.

I have an Excel sheet that contains information for each fund, including the administrator, fund name, and the link to the page where the technical sheets are published, formatted like this:
ADMIN, FUND NAME, TECHNICAL SHEETS LINK

Using this data, I just need to visit each link and download the corresponding PDF for the required month.

The main challenge is that each website is different, so there’s no single solution that works for all of them.

My current approach is to fetch the HTML content of each page and search for the correct PDF link using the following keywords:

  • Year
  • Month
  • Fund name
  • Phrases like “Technical sheet” or variations of it

However, I’m running into a lot of issues with this method. It doesn’t work consistently across all websites, and when the correct PDF isn’t found, there’s no reliable way to detect that it failed.

If you know of any ideas or tools—especially AI-based ones—that could help make this process more efficient, I’d really appreciate your input.