The dndGPT Case Study for You and Me!

Thanks, @elmstedt, I have definitely decided to not poke sleeping bears and rebranded the whole thing. Thanks everyone for helping me.

This image was chosen out of three iterations, “in a 1940’s newsprint comicstrip style,” then taken into Photoshop to correct hallucinations, largely the text.

P.S. We used Comic Sans. :heart_eyes:

Fine-Tuning Visual Fine-Tuning

Otherwise we’re continuing to experiment with so-called Visual Fine Tuning using the Adventure Assets presented above, and have had some unexpected progress.

"Fine tuning is for when it’s easier to “show, not tell.” Which is both the best explanation of “fine tuning” I’ve read so far, and literally what we’re doing here.

Interestingly, this advice, “to show, not tell,” is considered “good storytelling” in other professions: As in, this is the same advice given to creative writers when alliterating a point in a novel. I believe this parallel is a helpful metaphor for better fine tuning AI in general.

The Model Pays More Attention to the Image Than Expected

Perhaps most unexpected event in current experimentation is the way the model pays attention to the images in the Adventure Asset (or Visual Fine Tuning File).

You see the golden-sepia hue our big buddy is taking in these uncorrected images? He’s not exactly lit correctly for the scene. This color is not verbally prompted anywhere—in fact, “ivory molten bone” is specified.

This color is coming from the actual visualization of the Bone Golem as frequently presented in the Adventure Asset. (See above). :partying_face: :heart_eyes:

This is an interesting development.

The AA is presented like this because it’s supposed to look like an ancient schematic, all browned and what-not.

It’s actually a little difficult to get the model to get the bone colors correct, even though it starts to eventually use the right tones through some simple feedback. This means that the visualization plays a strong role in image generation, and that’s really cool. It’s great news for detail-oriented illustrators everywhere.

Implications and Next Steps

On the surface, it seems like you want to have the clearest image possible in exactly the right colors in your Visual Fine Tuning File, otherwise the model will get confused.

This document is meant to be both Human and Machine readable—teaching both AI and DM to recreate the illustration. Visual consistency is important.

Therefore, I want to try a few iterations where there is some explicit copy in the Asset to the AI about ignoring that sepia effect, which is for the humans. If it doesn’t work, I’ll update the images in the proper colors.

Show All Angles

In other news, it is important in your Visual Fine Tuning to show the Object from as many angles as possible and to make sure they are clearly represented and labeled in your file.

While working on this series, I actually gave up. I wanted the model to wholly imagine the Bone Golem facing away, and gave very little in the way of verbal specifics. I couldn’t tell you how many images we iterated through without success.

It’s clear that this type of fine tuning file needs as much nuanced information of the object being reproduced as possible, from multiple angles, in order to be effective as a few-shot learning procedure.

Some Unexpected Magic

The magical thing about that whole back-shot I was trying to get was how it actually happened.

Artists can tell you, sometimes after you’ve been beating your head against a block for awhile, if you turn your attention elsewhere, sometimes the block just melts and everything comes together.

I guess it’s the same for ChatGPT.

Almost an hour after we were going for the Bone Golem facing away from the viewer, and gave up, the model came up with these most unexpectedly. I almost didn’t notice the change. It was pretty cool you guys. Now I can pop into Photoshop, extract the big fellow’s back, and put them into the Adventure Assets.

Then Things Got Ridiculous

Next Steps: How Does the cGPT Code Interpreter Work?

In addition to updating the Adventure Asset for a few more tries, I think the way to managing large battles and interactions is by creating simple Python games, rather like complex chess boards.

I know the cGPTs can execute code, but what I don’t know (if anyone does) if they can keep a state going in the background?

That is, the CustomGPT chat window’s advanced capabilities time out. (You can loose data if you’re not careful.) But, does anyone know if, while the chat is live, if code can be ran and maintained in the background? (We’re talking specifically about using the cGPT’s native abilities, not accessing via the API or using an Action.)

So can we have ChatGPT execute some code that will then await further instruction; OR, does code have to be executed in single steps.

Based on this, it will either be possible to run simple interactions in the background; OR, execute code that outputs the full game state into the thread to await further instructions.