Vision model for text extraction

are there any links here? I maintain OSS library for screen recording (google screenpipe github). And I tried GPT-4o on text extraction (OCR) tasks for screen data. It turned out to be really really bad.

So I made some research and found Unstructureio that specializes on text extraction from images, pdf, and other formats.

The quality is amazing even for small font 4k resolution. It’s not free, but there is a free tier (you have to apply for it).

Here is sample code in Rust:
use anyhow::Result;
use reqwest::multipart::{Form, Part};
use std::fs::File;
use std::io::Read;
use std::path::Path;
use tokio::fs::File as TokioFile;
use tokio::io::AsyncReadExt;
use std::env;

#[tokio::main]
async fn main() → Result<()> {
let api_key = env::var(“UNSTRUCTURED_API_KEY”).unwrap_or_else(|_| “API key not set”.to_string());
let api_url = “”.to_string(); // I had to destroy lonk otherwise I can’t post it on openai forum
println!(“API Key: {}”, api_key); // Print full key for debugging
println!(“API URL: {}”, api_url);

let file_path = "Screenshot.png";

let mut file = File::open(file_path)?;
let mut buffer = Vec::new();
file.read_to_end(&mut buffer)?;

let file_name = Path::new(file_path).file_name().unwrap().to_str().unwrap();
let part = Part::bytes(buffer)
    .file_name(file_name.to_string())
    .mime_str("image/png")?;

let form = Form::new()
    .part("files", part)
    .text("strategy", "auto")  // auto/fast/hi_res/ocr_only
    .text("coordinates", "true");  // Add this line
    // .text("hi_res_model_name", "detectron2_onnx"); // yolox

let client = reqwest::Client::new();
let response = client
    .post(&api_url)
    .header("accept", "application/json")
    .header("unstructured-api-key", &api_key)
    .multipart(form)
    .send()
    .await?;

println!("Response status: {}", response.status());
println!("Response headers: {:#?}", response.headers());

if response.status().is_success() {
    let text = response.text().await?;
    println!("Extracted text:\n{}", text);
} else {
    println!("Error: {}", response.status());
    println!("Response: {}", response.text().await?);
}

Ok(())

}

written by Matthew Diakonov