(Unofficial) Weekend Project / Hackathon Ideas

phyde1001 · March 25, 2025, 1:55pm

Extract Images

<?php
error_reporting(E_ALL);
ini_set('display_errors', 1);

$filename = "104-10004-10143 (C06932208).pdf";
if (!file_exists($filename)) {
    die("File not found\n");
}
$content = file_get_contents($filename);
if ($content === false) {
    die("Could not read file\n");
}
echo "PDF loaded\n";

$offset = 0;
$imageIndex = 0;

while (($objPos = strpos($content, "/Subtype/Image", $offset)) !== false) {
    echo "Found image marker at $objPos\n";

    // Try to find nearest object header before this
    $searchWindow = substr($content, max(0, $objPos - 1000), 1000);
    $objMatch = [];
    if (preg_match('/(\d+)\s+0\s+obj/', $searchWindow, $objMatch, PREG_OFFSET_CAPTURE)) {
        echo "Found object start: " . $objMatch[0][0] . "\n";

        $streamPos = strpos($content, "stream", $objPos);
        $endstreamPos = strpos($content, "endstream", $streamPos);

        if ($streamPos === false || $endstreamPos === false) {
            echo "Stream not found\n";
            break;
        }

        $streamStart = $streamPos + strlen("stream");
        while (in_array($content[$streamStart], ["\r", "\n"])) $streamStart++;

        $streamData = substr($content, $streamStart, $endstreamPos - $streamStart);

        $filterInfo = substr($content, $objPos - 500, 500); // Scan a bit before for filter info
        $extension = "bin";

        if (strpos($filterInfo, "/Filter /FlateDecode") !== false) {
            $extension = "raw";
            $decompressed = @gzuncompress($streamData);
            if ($decompressed !== false) {
                $streamData = $decompressed;
            } else {
                echo "FlateDecode decompression failed\n";
            }
        } elseif (strpos($filterInfo, "/Filter /JBIG2Decode") !== false) {
            $extension = "jb2";
        } elseif (strpos($filterInfo, "/Filter /DCTDecode") !== false) {
            $extension = "jpg";
        }

        $outFile = "image_$imageIndex.$extension";
        file_put_contents($outFile, $streamData);
        echo "Saved to $outFile\n";

        $offset = $endstreamPos + 9;
        $imageIndex++;
    } else {
        echo "Couldn't find object start near $objPos\n";
        $offset = $objPos + 15;
    }
}
?>

bintopng

<?php
// Turn on error reporting for debugging
error_reporting(E_ALL);
ini_set('display_errors', 1);

// Image dimensions from the PDF metadata
$width  = 2563;
$height = 3309;
// For a 1-bit image, each row uses ceil(width/8) bytes.
$rowBytes = ceil($width / 8);
$expectedLength = $rowBytes * $height;

// Read the extracted compressed image file
$compressedData = file_get_contents('image_0.bin');
if ($compressedData === false) {
    die("Failed to read image_0.bin\n");
}
echo "Read compressed data (" . strlen($compressedData) . " bytes).\n";

// Attempt to decompress the data.
// Try gzuncompress first.
$decompressed = @gzuncompress($compressedData);
// If gzuncompress fails, try gzinflate.
if ($decompressed === false) {
    $decompressed = @gzinflate($compressedData);
}
if ($decompressed === false) {
    die("Failed to decompress image data\n");
}
echo "Decompressed data to " . strlen($decompressed) . " bytes.\n";

if (strlen($decompressed) < $expectedLength) {
    die("Decompressed data is smaller than expected: got " . strlen($decompressed) . " bytes, expected $expectedLength bytes\n");
}

// Now we have the raw image data.
$imageData = $decompressed;

// Create a truecolor image using GD
$image = imagecreatetruecolor($width, $height);
if (!$image) {
    die("Failed to create GD image\n");
}

// Allocate black and white colors
$black = imagecolorallocate($image, 0, 0, 0);
$white = imagecolorallocate($image, 255, 255, 255);

// Process the raw data row by row and pixel by pixel
for ($y = 0; $y < $height; $y++) {
    // Get one row (packed in $rowBytes bytes)
    $rowData = substr($imageData, $y * $rowBytes, $rowBytes);
    for ($x = 0; $x < $width; $x++) {
        // Calculate which byte and bit represent this pixel.
        $byteIndex = (int)floor($x / 8);
        $bitIndex = 7 - ($x % 8); // Most-significant bit first
        $byte = ord($rowData[$byteIndex]);
        $bit = ($byte >> $bitIndex) & 1;
        
        // The /Decode array [0 1] in the PDF means: 0 → black, 1 → white.
        $color = ($bit === 1) ? $white : $black;
        imagesetpixel($image, $x, $y, $color);
    }
}

// Save the resulting image as a PNG file.
$outputFile = "converted.png";
if (imagepng($image, $outputFile)) {
    echo "PNG image saved as $outputFile\n";
} else {
    echo "Failed to save PNG image\n";
}

// Free the GD image from memory.
imagedestroy($image);
?>

The models are AWESOME… 4o struggled but did it on attempt 3 but o3 mini fixed the last bug…

converted.png

Proving this to the masses and engaging them is your job as a ‘Regular’…

Head in a box is just a Model or CPU

jochenschultz · March 25, 2025, 2:03pm

You mean the text extraction? Yes, that worked pretty okish. Not perfect but better than raw tesseract.

But the reasoning over 20 metrics to get a score from 1 to 100? I mean I did not expect much and got a little bit.

My job has nothing to do with my hobby “Regular”.
What ever I do here is purely what I want.

phyde1001 · March 25, 2025, 2:04pm

Further suggestions…

Unlock the JFK thread… Posts CAN be deleted, I’m happy for my arguments to be removed for a better signal strength.
Maintain these threads with a SOLUTION or VALUE ADD disclaimer.

I have more…

Arms length forum, yes… ‘Regular’… Now that doesn’t chime at quite the same pitch… Anarchy can’t rule…

1793: During the French Revolution, the French National Convention stated, “They must consider that great responsibility follows inseparably from great power.”
1817: British parliamentarian William Lamb remarked, “the possession of great power necessarily implies great responsibility.”
1906: Winston Churchill, addressing the colonial responsibilities in South Africa, asserted, “Where there is great power there is great responsibility.”

Spiderman… the phrase first appeared in Amazing Fantasy #15 (1962) as a narrative caption:

“…with great power there must also come—great responsibility!”

Do GREAT THINGS!

jochenschultz · March 25, 2025, 2:11pm

I don’t see any reason why it should be reopened. There was no other contestant so what kind of a competition is that?
I mean obviously I could have just put that here in there to “win” it

phyde1001 · March 25, 2025, 2:11pm

WEEKEND PROJECT! … Unless you feel threatened?

I am sure given a little time your answer will look dated too

Will you take on the RESPONSIBILITY for updating your thread?

jochenschultz · March 25, 2025, 2:51pm

when a domain expert of the jfk case asks

phyde1001 · March 25, 2025, 2:56pm

Thar’s bugs in ur script… buffer overrun

phyde1001 · March 25, 2025, 3:00pm

Another question

One entry on Developer Forum Weekend Project / Hackathon

Head of Developer Experience at OpenAI @romainhuet ^^

_j · March 25, 2025, 6:06pm

My hack, or rather feeding back an AI model a log of JSON parse errors and letting it keep on doing the work, is turning the API documentation, in its javascript variable list of objects form served to the platform site, into a python object and formatting with code to be something to be rendered however one desires, like forum markdown or as AI input.

phyde1001 · March 25, 2025, 6:09pm

Thank you @_j embrace the entropy!

jochenschultz · March 25, 2025, 6:14pm

What are the numbers on the left side of the code?

Screenshot from 2025-03-25 18-59-14

Are you sure that’s a usable base for something?
I mean it is obvious that the code is messy - anything that has more than 200 lines of code per file is per definition shit code. It becomes unmanagable except for the guys who made it in the first place (up to a month after they did it and then it becomes a relict).

So we have three options:

Hack together awkward workarounds to make aioquic work with Trio (which is messy), or
Avoid Trio entirely and use asyncio (which sacrifices all the structured concurrency benefits).
Or build something new - like in a few weekend hackathons

something modular - where one can work on relevant parts

Something like this:

phyde1001 · March 25, 2025, 6:20pm

I’m going to stay out of this one and let you guys work it out… Get a room!

_j · March 26, 2025, 5:16am

Opinions of this file? I highlighted who the “shit coder” of cloud AI transports is.

jochenschultz · March 26, 2025, 6:43am

You think there are no shit coder working at google?

phyde1001 · March 26, 2025, 3:56pm

I saw something funny today… I think it’s worthy of one post here… It’s kinda relevant… Best tech support minds in the world on this...

Hackathon Ideas… Address Book Management?

_j · March 26, 2025, 11:34pm

Midweek Challenge: User Input Clarification and Enhancement

Background: This challenge is suitable for a broad audience of teams. It mainly involves prompting and selecting the most effective AI model up to the task, without unnecessary expense or time.

Challenge Overview: User Input Prompt Rewriter

Goal of the Challenge:

Users often have a clear idea of what they want. However, the way they type their requests can be unclear or confusing for AI models. You aim to fix that.

Example Scenario:

User Input: “The composition now looks like a hobo on bath salts wrote it.”

Possible AI confusion:

Should the AI apologize for the poor quality?
Should the AI revert to a previous version?
Should the AI immediately rewrite the composition without guidance?
Should the AI address the inappropriate language or substance abuse reference?

Your Task:

Design a new API-based AI function that processes unclear or confusing user input messages, or even makes the best better. Focusing on the singular task of understanding gives a higher attention to the underlying desires then when used as normal input.

The prompting and what it solicits and entails should consider the user’s input and any available context. It will then produce a clear, complete, and standalone rewritten message. This rewritten message should clearly express the user’s original intention and desired outcome, removing ambiguity and confusion.

The actual input sent might never need to be shown to the human. It also could be an addition, “to further clarify…”

Key Needs:

Output is a convincing replication of what a user would input, to avoid any other confusion
The rewritten message must clearly state the user’s intention.
The rewritten message must be complete and understandable without additional context.
The rewritten message must remove any unclear references and incorporate them.
The rewritten message must align with the capabilities and performance of AI models.

Your solution should help AI models better understand and fulfill user requests, even when the original input is unclear or poorly expressed.

Hints for internal chain-of-thought:

break down and parse input language and decode statement meanings
restate new goals and what shall be fulfilled
outline the context or specialization in which the request is taking place
employ prior turns to dereference the implied task.
deal appropriately with huge inputs, like massive code dump or document
…
profit when the AI clearly understands what it needs to produce now!

Your entry

the prompt
the model
the context you place

The winner

the community

If it gains enough traction, someone could move great ideas (or not) to a new topic.

jochenschultz · March 28, 2025, 10:54pm

“production ready” haha.. not yet.. will need another day most likely - too lazy to test on a live server now.

_j · April 2, 2025, 10:39am

I applied such AI technique more publicly on the forum here a bit ago, when the “user input” was an incorrigible forum topic creation…and the output discovers and rewrites the underlying intention.

Reasoning being done on it, for the curious.

Understand the Goal: The user wants a way to list all Threads associated with a specific Assistant in the OpenAI API.

Identify the Core Problem: The user is creating many Threads because runs sometimes fail or time out. They want to reuse threads, but failures force them to abandon threads and create new ones. They cannot clean up these abandoned threads because the API currently lacks a way to list threads filtered by the Assistant they were created for/with. This leads to orphaned resources.

Analyze the Original Input - Sentence by Sentence:

“Our app chunks message no larger than 1000 characters and sends to chunks as messages on 10 threads.”

Meaning: The application breaks large messages into smaller pieces (<= 1000 chars) and distributes these pieces across 10 different Threads simultaneously.

Clarity: Reasonably clear, but “sends to chunks” is slightly awkward. “sends these chunks as messages” is better. The use of 10 threads seems like a specific implementation detail, maybe important context for why they need cleanup, but not the core request itself.

Underlying Need: Explaining their current architecture and how threads are used.
“So long as the run responds in a timely manner, I reuse the threads.”

Meaning: If an Assistant Run on a Thread completes successfully and quickly, they continue using that same Thread for future interactions.

Clarity: Clear.

Underlying Need: Highlighting the desired state (thread reuse).
“If the request times out or an HTTP error or some other issue occurs, I cannot STOP the run or the thread despite sending requests to cancel the run and then the thread because ChatGPT usually throws some error.”

Meaning: When API calls (likely create_run, retrieve_run, or maybe even cancel_run) fail (timeout, HTTP error, other issues), their attempts to cancel the ongoing Run and subsequently manage the Thread fail, often resulting in API errors. “ChatGPT” is used informally here, meaning the OpenAI API.

Clarity: A bit run-on. “Cannot STOP the run or the thread” is slightly inaccurate; you can attempt to cancel a run, but deleting a thread isn’t directly tied to stopping a run (though an active run might prevent deletion, which isn’t explicitly stated but implied). The core issue is failed runs lead to unusable threads they can’t reliably clean up.

Underlying Need: Describing the failure scenario that leads to abandoned threads. Highlighting the unreliability of the cancellation process in their experience.
“So I start new threads.”

Meaning: As a consequence of the previous point (failed runs/cancellation), they abandon the problematic thread and create a new one to continue the user interaction.

Clarity: Clear.

Underlying Need: Explaining the direct result of the failure scenario – thread proliferation.
“If I can’t get a response in 20 seconds, it’s useless.”

Meaning: Their application has a strict timeout requirement (20 seconds) for Assistant responses. Runs exceeding this are considered failures.

Clarity: Clear.

Underlying Need: Providing context for why they consider runs failed (timeouts are a key reason).
“I would love to clean up abandoned threads, but can’t because I have no way of obtain a list of threads belonging to the assistant.”

Meaning: The user wants to delete the threads created due to failures but lacks the necessary API functionality: listing threads associated with a specific Assistant ID.

Clarity: Clear, gets to the core request. “obtain a list” is slightly formal but fine.

Underlying Need: Stating the desired action (cleanup) and the blocker (missing API feature).
“I can list everything EXCEPT threads, so I can’t delete the threads.”

Meaning: They are aware of other list endpoints (Assistants, Files, etc.) but emphasize the specific lack of a Thread listing endpoint (especially one filterable by Assistant). The second clause “so I can’t delete the threads” isn’t strictly true; they can delete threads if they have the Thread ID, but they can’t find the IDs of the orphaned threads associated with a specific Assistant to delete them en masse or systematically.

Clarity: Mostly clear, but the causal link (“so I can’t delete”) is slightly imprecise. The real issue is finding the threads to delete.

Underlying Need: Reinforcing the missing API capability and its consequence.
“For the love of god… add an endpoint to obtain the threads belonging to an assistant.”

Meaning: An emphatic plea for the specific feature request: an API endpoint to list Threads, filterable by Assistant ID.

Clarity: Clear, strong emotional tone.

Underlying Need: The direct feature request.

Determine the Domain: OpenAI API, specifically the Assistants API (Threads, Runs, Assistants).

Identify the Underlying Desire/Intention: The user needs a way to manage (specifically, find and delete) Threads associated with a particular Assistant, especially those abandoned due to failed Runs (timeouts, errors). Their current architecture (multiple threads, timeouts) exacerbates this problem. They are frustrated by the lack of a specific API endpoint (list_threads potentially filterable by assistant_id).

Reconstruct Step-by-Step (incorporating original voice/tone where appropriate):

“Okay, so here’s the deal with our setup using the Assistants API. We’ve got this system where we break down user messages into smaller chunks, about 1000 characters max, and fire them off as messages across like 10 different threads for an assistant to process.”
“Ideally, when a run finishes up quickly and everything’s smooth, we just keep using the same thread for the next message. That works great.”
“But things go sideways sometimes. If a request takes too long – anything over 20 seconds is basically useless for our app – or we hit an API error, we run into trouble. We try to cancel the run, maybe even clean up the thread right then, but honestly, that often just throws more errors or doesn’t work reliably.”
“So, what happens? We have to ditch that problematic thread and spin up a new one just to keep things moving for the user.”
“The result is we end up with a growing pile of these abandoned threads.”
“We really want to clean these up – delete the old, unused ones. But we’re stuck. We can list assistants, files, you name it… but there’s no way to get a list of all the threads that belong to a specific assistant.”
“Without that list, we can’t find the IDs of the threads we need to delete.”
“Seriously, please, could you add an API endpoint to list threads, specifically with the ability to filter them by the Assistant ID they’re associated with? We desperately need this to manage our resources properly.”

phyde1001 · April 4, 2025, 9:00am

I will simply reference a new idea I shared this week… I jumped in rather cheekily last week, as is my playful and expressive nature. I think whether regulars or members running these Weekend Projects, I have had my go

phyde1001 · April 13, 2025, 12:50pm

A new Weekend Project idea my son did this weekend for anyone looking for ideas.

Black Swan Podcast Powered By OpenAI API.

Topic		Replies	Views
First, I encountered lag, and then an API error occurred Bugs api-bug	6	152	May 20, 2025
Realtime API (Advanced Voice Mode) Python Implementation API gpt-4o , advanced-voice , realtime	15	11110	February 9, 2025
Availability of OpenAI api doc dump Documentation openai-documentation	4	2240	November 17, 2023
Concurrency Rate Limiting: A $10,000 Issue API bug	19	2156	April 19, 2025
Python integration of real time? API	13	3810	October 5, 2024