Unstructured.io with textract and similar cloud services

Looking to compare traditional cloud services for text extraction with new kid on the block, unstructured dot io

Are they work in similar way or it’s completely different services?

Don’t know much about the cloud services, but I can say unstructed.io is way better than beautiful soup, imho, at least for js/style heavy html pages.
integrates nicely w requests.get()

Thank you, that’s a useful bit of information. I imagine that’s what all the plugins using under the hood?

Don’t know about that. It’s what I use in llmsearch (unverified plugin mostly for my personal use)

looks like Textract is way ahead; I’ve tried unstructured sdk (with local inference) on a couple of pretty simple, searchable pdf file

for the first one - empty response: Textract handled it correctly
for another one - correctly recognized by both, however the Textract has node sdk to work with responses and supports all other major languages to make requests

I hope there will be more comprehensive analysis of differences when companies start adopting these tools.

I can believe that. For my use case, cost, latency, privacy were priorities, and I didn’t need pdf support.
I’ll have to take a look at textract, sounds like a winner.