I did multiple tries with this prompt. There is always some error with the layer shapes or image conversions the AI got wrong. But the code frequently gets close to working. I have access to many models via open-router, except for o1. On openrouter I suspect o1-preview answers too quickly to be using a decent amount of “thinking”. On chatgpt o1 got really close to working on the first try. I’m curious if o3 can get it right, anyone with access willing to give it a try?
Write a program that is an attempt at an AI that will colorize an image. it will convert the original grayscale image to HSV. keep the V layer and replace H and V with noise, and do denoising steps on H and V to colorize it, then convert the resulting image back to RGB and present it to the user.
Hard requirements:
* Implement and train a UNET, where you must convert the image to HSV, add noise only to the H and S layers, and denoise preserving the original V values, then convert the image back to RGB. These image conversions can be performed either in CPU or GPU, whatever is simpler for the program
* does denoising on multiple steps
* generate a preview with a popup window when a checkpoint is loaded and after each training epoch
* after each epoch you must save a checkpoint, without overwriting the previous ones
* in the program start, you look if there is already a checkpoint, if there is you load the latest checkpoint available. you must not fail if the directory wasn't created yet
* the dataset must be downloaded automatically from whatever ML framework you're using
* don't use tensorflow, as it doesn't support windows with CUDA
* you must use CUDA and run on windows
* Make it a single source file for simplicity on our process.