Need serious beta testers for TRL5: on-prem dataset cleaning pipeline using OpenAI API

Andrea_Lattanzi · March 14, 2026, 2:19am

Hi everyone,

I’m looking for a small number of serious technical beta testers for Purify Factory, a model-agnostic on-prem pipeline for cleaning noisy text datasets at scale before downstream AI use.

This is not an open-source release and not a casual “try it if you feel like it” beta.
This phase is meant to support TRL5 validation, so I’m specifically looking for testers who can run real, structured tests and provide useful, reproducible feedback.

What Purify Factory does:

cleans noisy multilingual text datasets
processes JSONL datasets with sentence or text fields
produces auditable output with original text, cleaned text, token usage, cost, and provider metadata
supports multiple providers, including OpenAI, plus Anthropic, Gemini, and local backends

What I need from testers:

Linux x86_64 environment
willingness to test on a real dataset, not toy samples
ideally 1,000+ records minimum; 5,000+ preferred for stronger validation
ability to report installation issues, runtime behavior, failure modes, output quality, and edge cases in a structured way
willingness to share feedback that is actually usable for certification and release hardening

Important constraints:

the repo is the beta access point, not a source-code release
access requires a personal free beta license
the license is machine-specific
testers must use their own API key / credits
this is best suited for developers, data engineers, NLP practitioners, or technical teams already working with text preprocessing / LLM data pipelines

If you are interested and you can run a serious validation pass, the beta repo is here:

https://github.com/mentoratechnologies/PurifyFactory-Beta

I’m especially interested in feedback from people already using the OpenAI API in production or evaluation pipelines, because I want to understand how Purify Factory behaves in real data-preparation workflows, not just in isolated demos.

If you match this profile and want to participate, reply in the thread or reach out through the repo instructions.

_j · March 14, 2026, 11:10am

This is preposterously dumb.

Download closed-source Linux software, let it scrape and transmit data about your system.
Develop your own data with thousands of entities and provide it to someone.
Use your own API credits for whatever the code wants to perform.
To benefit nobody but a for-profit closed entity that joined the forum two days before advertising a repo with one contributor with nothing else.

Oh, and most amazingly, you write a system message also, " This quality standard is defined by you through the system prompt — describe the cleaning rules you want to apply in natural language, and PurifyFactory applies them consistently and verifiably to every record in the dataset.". So you get to be a prompt engineer to someone that can’t do that to deliver their product.

This deserves a lock and a de-list from the forum is the “feedback”. $50 is my 15 minute increment for my time, I’ll be sending the bill.

Topic		Replies	Views
Managing your Users OpenAI Token Usage Community	2	1479	December 16, 2023
BlogeaAi - GPT Powered Co-Writer for Blogging Community chatgpt , api	9	1158	October 28, 2023
Looking for a Security Review of My AI Project (Work in Progress) Community api , assistants-api	3	157	August 17, 2025
Having trouble with the new Open AI Ads Beta App and connecting paid account APIs Community beta-team	3	854	January 15, 2026
WordPress plugin: AI Comment Moderator Community project , wordpress , comments-moderation , ai-spam-filter	26	6967	May 7, 2024

Need serious beta testers for TRL5: on-prem dataset cleaning pipeline using OpenAI API

Related topics