Hey everyone!
I’d love to share something I’ve been working on.
The Idea:
I wanted to build a simple price comparison tool where a user sees a unified product card with prices from multiple sellers.
The challenge: suppliers often list the same product under different names, SKUs, and categories.
For example:
- One store might list: “iPhone 16 128GB Ultramarine” under Mobile Phones → iPhone
- Another store might list: “iPhone 16 128 gb Blue” under Smartphones These are the same product, but they look completely different on paper — and that usually requires a human to sort them manually.
The MVP I built:
My goal was to create a system that can automatically detect when two products are the same, even with different names/categories.
Technologies I used:
- ChatGPT + RAG + batching — used GPT to compare product pairs and decide if they’re identical. I optimized for cost since supplier lists can be huge.
- PostgreSQL — to store product data
- PHP — my main language, used to build the prototype
- Elasticsearch — to support vector search / semantic similarity
- RabbitMQ — for background processing of imports and comparisons
My internal testing showed that the AI matching accuracy reached about 85%. Some false positives happened, but by using logprobs filtering, I could identify uncertain cases and flag them for human review.
So, here’s where I got stuck:
- I don’t have funding or savings to scale further — GPT API isn’t free
- I don’t have a source of affiliate product feeds or open supplier data
- I don’t know what the next step should be
I’d really appreciate your thoughts — do you think this kind of system could be useful or interesting to anyone?
Thanks for reading!