Super Simple PHP / Vanilla Javascript for DALLE3 API (+ Programming Languages Debate!)

I’m not an expert, nor am I aligned with any ideological camp, I view this from a technical perspective, in case anyone is interested…

As an example, imagine a rendering engine like Maya, but instead of using ray tracing, it uses AI elements. Unlike DALL-E, which simply renders an entire image, a specialized tool has various AI algorithms that are trained to solve a specific task well and add it to an image. For example, global illumination, caustics, texture, reflection, etc., while others handle animation.

A human artist receives tools that allow them to create and optimize meshes faster, then animate them. They no longer need to deal with complicated and tedious technical tasks like rigging. All these specialized neural network systems are AI functions, which can be broken down into a few fundamental algorithms. It’s more about graph theory and how these graphs can be trained for specific tasks to be as efficient as possible. These are nested within each other, and eventually, you have Maya software that uses AI instead of ray tracing, with many specialized sub-functions. And an AI language uses these almost like a function, but within a network-like graph system.

This is how programs evolved. If you want to express it that way, AI languages are not like today’s 1-dimensional programming languages, where functions follow a line or sequence from a list, but instead are a multidimensional graph in which the important parts are networked.

At some point, this becomes too complex for a human to program. We will use AI much like we use a lathe. A lathe can work with a huge piece of metal, something a human couldn’t do physically, neither in terms of weight nor precision. So, we create a tool and then create a tool to build a tool that will ultimately produce the product. And that’s how AI development will proceed. And we will understand simply even less of there functions like we not really understand software today either.
As humans, we already have trouble with basic logic today. Network-like processes and feedback loops, and especially both together, are too much for us. But we can understand the principles. (there is some… i can not talk here… but think about…)

The most fascinating thing about it is that a simple computer with minimal functions (storing data, basic math, comparisons, jumping in memory) can fulfill this vast range of functions (writing text, playing music, playing videos, simulating 3D games, ray tracing, and so on and so forth). And all based on nothing else then numbers, coded in {0,1}. It perfectly exemplifies that the whole is more than the sum of its parts. And AI is simply the next stage of this phenomenon, and yes, it is impressive, especially because the foundation is so simple. The same applies to the universe, by the way.

Today, we are fascinated by it, just as we were back when the first PCs appeared, I witnessed that. Eventually, we’ll get used to its functions, and it will become a regular tool.

The main problem isn’t the tools, but the people using them and their motivations. And somehow, NO ONE wants to discuss that. A tool is a will-power and motivation amplifier. And at some point, this amplification will become too great for human insanity.
Melee weapon > ranged weapon > projectile weapon > Chemical and biological weapon > genetical weapon > atomic bomb > hydrogen bomb > quantum bomb (CERN) >…
So far, we have focused 95-99% of our energy on parasitism and destruction. Money, resources, intelligence, and human lifetime. And we project the danger on the tools instead to look in the mirror. If the insanity doesn’t stop, it will be over soon, and AI is NOT to blame for that.

So in short, i not care much that we will not understand a AI language…

… But I can delete the post if it’s annoying. It’s your post, @PaulBellow, so I’ll delete it if it’s too much.

I doubt it… At least not with the current transformer based models. The reasoning being that they have a heavy preference to parroting back the tokens in their training distribution which are 99% human readable things. If anything, I’m begging to think the code they write (and run) will be even more human readable. Here’s a conversational programming language called Convo that I had ChatGPT create over the weekend:

You can think of a Convo program as just a detailed design spec. Basically like the spec a PM would give a Developer. It expresses the goals, behavior, and even look and feel in detail as a form of pseudo code. The interesting bit is that you can pass one of these convo programs to an LLM and ask for a python version of a program. What you get back is much more consistent implementation of the program then if you had just said “create the game snake in python”. There’s still some variance but the resultant program is way more likely to have all of the features and look & feel that are outlined in the convo program.

In my SynthOS project I’m essentially using a variant of this to pass a program request from my top level voice agent to the programming agent powered by o1. This is effectively just a prompt and there’s nothing overly special about the format of a convo program. Any instruction prompt is technically a valid convo program.

At the end of the day… I believe that agents will essentially program each other by just talking to each other using language. That’s how humans program each other. You get your kids to execute a room cleaning program by saying to them “go clean your room”. I think agents will do the same thing.

1 Like

Having access to a “world model” helps too, no? Like wasn’t that why Sora was so cool (mostly)… ability for LLM to start thinking in a 3d space?

Thanks for the link. Wish I had more time…

Convo is actually really easy… It’s just a spec you tack on the end of your ChatGPT prompt…

Using convo: create the game snake

{{convo-spec}}

And what you’ll get back is this:

Section: Game Setup

Create a grid with 20 rows and 20 columns.
Create a snake starting at the center of the grid with a length of 3.
Set the initial direction of the snake to "right".
Place a food item randomly on the grid.

Section: Game Loop

While the game is running,
    Display the grid with the snake and food.
    Ask the user for the direction (up, down, left, right).
    If the direction is valid, then
        Update the snake's direction to the user's input.
    Move the snake in the current direction.
    If the snake eats the food, then
        Increase the snake's length by 1.
        Place a new food item randomly on the grid.
    If the snake collides with itself or the wall, then
        End the game and display "Game Over! Your score is [snake's length]."

Section: Functions

To display the grid with the snake and food,
    Clear the screen.
    For each cell in the grid,
        If the cell contains a part of the snake, then
            Display "S".
        Else if the cell contains food, then
            Display "F".
        Else
            Display ".".

To move the snake in the current direction,
    Calculate the new head position based on the current direction.
    Add the new head position to the snake.
    Remove the last part of the snake unless it has just eaten food.

To place a food item randomly on the grid,
    Choose a random empty cell on the grid.
    Place the food item in that cell.

You can then tweak the spec in anyway you want or convert it to python as is:

create a python implementation of the following program:

{{convo-program}}

Works with Claude, ChatGPT, OpenRouter, whatever…

UPDATE…

using convo: create a program for a robot to clean a room

Section: Cleaning Procedure

To clean the room:
    If there is trash in the room, then
        Pick up the trash.
        Throw the trash into the bin.
    
    If there is dust on surfaces, then
        Wipe the surfaces with a cloth.
    
    If the floor is dirty, then
        Sweep the floor.
        Mop the floor.
    
    For each object on the floor,
        If the object belongs on a shelf, then
            Place the object on the correct shelf.
        If the object belongs in a drawer, then
            Place the object in the correct drawer.
    
    Check if the room is now clean.
    If the room is clean, then
        Display "The room is clean."
    Otherwise,
        Display "There is more cleaning to do."

Start the cleaning process by running the "clean the room" procedure.

That program actually has a bug… I would clean the floor last so you can just take that section of instructions and move it after the For each

1 Like

Looks like something @mitchell_d00 would dig for its old school BASIC vibes…

Might be useful for gamedev too. Hrm…

Thanks again for sharing.

1 Like

he was the first to jump on it this weekend lol… We discussed quite a bit…

There’s nothing special about the format which is why you can just paste the output into ChatGPT and say convert to python. The thing is that everything to the model is a transform. Saying “create the game snake using python” is a transform and “create the game snake using convo” is a transform. Convo is simpler then Python so there’s less decisions the model has to make. Less decisions equals more consistent responses…

2 Likes

Whoosh. D’oh… I’m so bad at this lol

To my credit, I figured he might be interested lol

Okay, yeah, more just natural language to whatever… But there is a clever / well written system prompt to set it up. (Just took a peek…)

I’ve been thinking a lot lately about the intersection between code and no-code / less-code… We kinda lump all our “devs” into just a few big categories here when it might be better to start thinking about that intersection more? I dunno…

Super happy to see so much sharing lately, though.

It’s an exciting time to be alive, for sure!

Maybe roommate programming…

import cleaner_library.inspect as Inspector
import cleaner_library.perform as Performer

def clean_the_room():
    # Inspect the room for trash and clean it up
    if Inspector.Room.check_trash():
        Performer.Room.pick_up_trash()
        Performer.Room.throw_trash()
    
    # Inspect for dust and clean surfaces
    if Inspector.Room.check_dust():
        Performer.Room.wipe_surfaces()

    # Inspect objects on the floor and organize them
    for obj in Inspector.Room.find_objects_on_floor():
        if obj.belongs_on_shelf():
            Performer.Room.place_on_shelf(obj)
        elif obj.belongs_in_drawer():
            Performer.Room.place_in_drawer(obj)

    # After objects are organized, clean the floor if dirty
    if Inspector.Room.check_floor_dirty():
        Performer.Room.sweep_floor()
        Performer.Room.mop_floor()

    # Verify cleanliness and display status
    if Inspector.Room.is_room_clean():
        Performer.Room.display_message("The room is clean.")
    else:
        Performer.Room.display_message("There is more cleaning to do.")

# Start the cleaning process
clean_the_room()
1 Like

This is probably how we will “program” AIs or better yet, construct in the future. We will use AIs to construct other AIs. An LLM, for example, to create global illumination for a graphics renderer, and then use two AI systems, where one constructs and trains, and the other checks, until the result is correct. And in this way, you can gradually put together a complete system. And how the AI communicates among itself will interest us just as little as assembler language interests less than 1% of programmers today. Only a few experts will engage with AI communication and will need to build “logic microscopes,” again with AI. Everyone else will use LLMs to construct things.

I can remember the hysteria when the first PCs appeared. “We are all going to die,” “We will all lose our jobs.” And the whole thing is repeating itself.

If we weren’t so like we are… we would understand that it’s simply about finding real, useful meaning in life when there’s less or little work to do, instead of thinking about how to maximize profits and get rid of the “useless.”

What we might need to overcome is the “paradise effect.” It’s not so easy to find meaning when it’s not imposed from the outside. One has to develop mentally.

… I’m rambling again, going way off the topic. Analysts…

1 Like

I didn’t say it wasn’t clever :slight_smile: just another step in my ever evolving understanding of how these models think.

I’ve been doing some A/B testing tonight of with Convo and without Convo and it clearly leads to better outputs. And not just more consistent outputs. Better outputs.

These models often struggle with task complexity. The simpler the task the better job they do. Since convo makes a lot of the decisions for a program up front when “designing” the program there are less decisions that need to be made in the second pass when the model is “implementing” the program. This reduced complexity looks like it results in better code generation from my purely anecdotal testing. I do have a visual thing I’ll show to illustrate this though…

Ok here’s a screenshot of my companies webpage:

I passed that through gpt-4o vision to extract all of the details and I want to see if I can regenerate that page from the image. Using just the output from vision here’s what I get:

Ok… on the right path… But if I take the output from vision and convert it to a convo program. When I then convert to html I get this:

The logo was probably closer in the just vision generation but overall which page would you want as a starting point? Like I said… Simply better outputs…

Both of these generations were o1-mini.

UPDATE…
I realized that it got the logo wrong because the details describing the logo didn’t make it into the convo based description. I just updated the logo part to include a more detailed description and this is what I get:

Here’s the generated convo program:

Create a page titled "Awarity Landing Page".

Section: Header
Display the "awarity" logo in the top-left corner. The logo is the lowercase word awarity in the middle of a large black circle that's outlined in white with 30px of padding.
Create a navigation bar in the top-right corner with links "HOME", "TEAM", "NEWS", and a button labeled "LEARN MORE".
Set the background color of the header to deep black.
Style the navigation links with bold uppercase text in bright orange-yellow.

Section: Main Content
Display the heading "Awarity is synthetic reasoning for massive datasets." in the center of the page.
Set the font to large, bold, sans-serif, and the text color to white.
Below the heading, create a button labeled "LEARN MORE".
Style the button with a bright orange-yellow background and black text.

Section: Lower Content
Create a background with a mosaic pattern using tiles in various shades of yellow, orange, and brown.
Display the subheading "Awarity harnesses LLMs to provide dependable synthetic reasoning over extremely large datasets, surpassing the capabilities of...".
Set the subheading font to bold, sans-serif, black, and slightly smaller than the main heading.
Bold the word "Awarity" in the subheading text.
Dock the Lower Content to the bottom of the page

Section: Styles
Define a style called "Button" with background color orange-yellow, text color black, and bold uppercase text.
Define a style called "Mosaic" for the lower content's background with a gradient from yellow to orange to brown.

Apply the "Button" style to the "LEARN MORE" button.
Apply the "Mosaic" style to the lower content background.

I think I understand why Convo seems to work so well…

If you notice, it rephrases the models output as a series of instructions. When this gets played back to the model it leans into the model’s instruction tuning which results in better output

Totally… In the Star Trek universe everyone is a programmer but you never see them writing a single line of code. You code by telling the computer what you basically want and it being smart enough to get the gist.

1 Like

Exactly! The complexity is so high, you can not handle it anymore. It is what i wrote before, we create tools to create tools. We stack strength above strength above strength, in the fiscal world, and we stack complexity on top of complexity in the logic world.
The concerns are more, how all is then shaped. like a pyramid (stable), a tower (less stable) or a pyramid upside down (very in-stable).

I am working right now to make a GPT advice set to create better prompts for DallE with all i had learned so far. It is still a pain. GPT now is not able to create working advice’s for it self. This is because the models are still not perfect, all is in development now and very new and young. It is like using a basic language with numbers for every line (what never made any sense). Then better programming language emerged, first modular, then object oriented, etc. The same we can now experience with this new data analytic technique.

We constructed the first logic machine with there assembler (direct machine language, nobody can understand), build the first simple higher languages, then functions, then modules, then objects, then assemblys etc, and now we constructed on top of all this network-logic pattern-analytics and LLMs. At the bottom is still assembler and a chip, now with some new architecture, but still a chip.

I have seen in the past language try to talk like human language. it never worked for me, you need a logic clear precise and mathematic like language to construct linear software. Now with pattern-analytics, and all what humans have created until now, it is possible to use this as a engineer. And now we see that we can “tell” the system to create the code. And it can create Code for linear old-fashion code, or even directly weights for “AI”. But at the very bottom, is still assembler.

Star Trek was sometimes smarter then what you see at the surface.

Thanks for sharing your starter code. Very cool and it worked the first time for me. Can’t beat good old PHP (7 and above)! :slightly_smiling_face:

1 Like

Yeah, Convo is basically a new name for comment driven development… When I started working with copilot a while ago, I had to adapt to a new workflow for writing code in my files so that it can keep up with my approach without wasting me too much time. Then for more complicated stuff I went to drafting the methods in plain comments and expanding the lines for the copilot to pick up on the approach.

Now I’m used to writing the code like that and copilot is no longer a burden.

1 Like

Especially when you have something like co-routines, multi threads, async events, and your script never dies (long running tasks in cli)… That’s makes a PHP a good competitor for many tools out there.

2 Likes

not trying to brag here but:

Last Year i generated a website with a MySQL Backend, an API Layer based on a nodeJS HTTP Server with calls to OpenAI for images and for (improving) prompts.

It was an exercise really to test AI itself by let it produce or enhance my code to verify code quality and man sometimes it is really bad.

it is meant to be for experimental stuff and not for a production site.
i basically used the idea of existing sites like gencraft to roughly create their functionality.
Learned a lot from it.

there is loads of code in there that should be improved but it runs.

it has cost me around 800$ on images and requests

there were many bugs in the code which let to long nights of debugging and also cost me the unneeded microservice ticks…

the code downloads the image to the backend server, saves it to the database, and stored the b64 plus prompt in a txt file that can be use for importing the images (by running some import script. (ergo i store it 3 times) when clicking the image in the viewer, a local download is performed.)

when clicking the prompt, a new image with that prompt is requested (from openai). when right-clicking the prompt, an improved prompt is requested ( of openai)

There is the option to generate x amounts of prompts at once and then request them to openai.

there is an imageviewer and image uploader and a prompt generator based on local ‘ip’ and the option to request a prompt from openai.

i don’t mind sharing some of the code. but as said, it is experimental and grew over time. sometimes i do not touch it for weeks and then i do and add stuff to it.

the next thing to add will be an image strip to browse the images visually.

2 Likes

Nice! Thanks for sharing.