FlashLearn - Integrate LLMs in any structured data pipeline

Flash Learn - Integrate LLM in any pipeline

License: MIT Pure Python Test Coverage Code Size

FlashLearn provides a simple interface and orchestration (up to 1000 calls/min) for incorporating Agent LLMs into your typical workflows and ETL pipelines. Conduct data transformations, classifications, summarizations, rewriting, and custom multi-step tasks, just like you’d do with any standard ML library, harnessing the power of LLMs under the hood. Each step and task has a compact JSON definition which makes pipelines simple to understand and maintain. It supports LiteLLM, Ollama, OpenAI, DeepSeek, and all other OpenAI-compatible clients.

:rocket: Examples

:open_book: Github

Installation

pip install flashlearn

Add the API keys for the provider you want to use to your .env file.

OPENAI_API_KEY=

High-Level Concept Flow

flowchart TB
    classDef smallBox font-size:12px, padding:0px;

    H[Your Data] --> I[Load Skill / Learn Skill]
    I --> J[Create Tasks]
    J --> K[Run Tasks]
    K --> L[Structured Results]
    L --> M[Downstream Steps]

    class H,I,J,K,L,M smallBox;

Learning a New “Skill”

Like a fit/predict pattern, you can quickly “learn” a custom skill. Below, we’ll create a skill that evaluates the likelihood of buying a product from user comments on social media posts, returning a score (1–100) and a short reason. We’ll instruct the LLM to transform each comment according to our custom specifications.

from flashlearn.skills.learn_skill import LearnSkill
from flashlearn.client import OpenAI

# Instantiate your pipeline “estimator” or “transformer”
learner = LearnSkill(model_name="gpt-4o-mini", client=OpenAI())
# Provide instructions and sample data for the new skill
skill = learner.learn_skill(
    df=[], #dif you want you can also pass data sample
    task=(
        "Evaluate how likely the user is to buy my product based on the sentiment in their comment, "
        "return an integer 1-100 on key 'likely_to_buy', "
        "and a short explanation on key 'reason'."
    ),
)

# Save skill to be used from any system
skill.save("evaluate_buy_comments_skill.json")


Input Is a List of Dictionaries

Whether you retrieved data from an API, a spreadsheet, or user-submitted forms, you can simply wrap each record into a dictionary. FlashLearn’s “skills” accept a list of such dictionaries, as shown below:

user_inputs = [
    {"comment_text": "I love this product, it's everything I wanted!"},
    {"comment_text": "Not impressed... wouldn't consider buying this."},
    # ...
]

Run in 3 Lines of Code

Once you’ve defined or learned a skill, you can load it as though it were a specialized transformer in a standard ML pipeline. Then apply it to your data in just a few lines:

from flashlearn.skills.general_skill import GeneralSkill

with open("evaluate_buy_comments_skill.json", "r", encoding="utf-8") as file:
    definition= json.load(file)

# Suppose we previously saved a learned skill to "evaluate_buy_comments_skill.json".
skill = GeneralSkill.load_skill(definition)

tasks = skill.create_tasks(user_inputs)
results = skill.run_tasks_in_parallel(tasks)
print(results)

Get Structured Results

FlashLearn returns structured outputs for each of your records. The keys in the results dictionary map to the indexes of your original list. For example:

{
  "0": {
    "likely_to_buy": 90,
    "reason": "Comment shows strong enthusiasm and positive sentiment."
  },
  "1": {
    "likely_to_buy": 25,
    "reason": "Expressed disappointment and reluctance to purchase."
  }
}

Pass on to Next Steps

Each record’s output can then be used in downstream tasks. For instance, you might:

  1. Store the results in a database
  2. Filter for high-likelihood leads
  3. Send them to another tool for further analysis (for example, rewriting the “reason” in a formal tone)

Below is a small example showing how you might parse the dictionary and feed it into a separate function:

# Suppose 'flash_results' is the dictionary with structured LLM outputs
for idx, result in flash_results.items():
    desired_score = result["likely_to_buy"]
    reason_text = result["reason"]
    # Now do something with the score and reason, e.g., store in DB or pass to next step
    print(f"Comment #{idx} => Score: {desired_score}, Reason: {reason_text}")

Supported LLM Providers

Anywhere you might rely on an ML pipeline component, you can swap in an LLM:

client = OpenAI()  # This is equivalent to instantiating a pipeline component 
deep_seek = OpenAI(api_key='YOUR DEEPSEEK API KEY', base_url="https://api.deepseek.com")
lite_llm = FlashLiteLLMClient()  # LiteLLM integration Manages keys as environment variables, akin to a top-level pipeline manager
ollama =  OpenAI(base_url = 'http://localhost:11434/v1', api_key='ollama', # required, but unused) # Just use ollama's openai client

KEY IDEA: JSON in, JSON out

Examples by use case

I really hope this lib will simplify integration of LLMs into your existing pipelines!