The dndGPT Case Study for You and Me!

Thanks, @anon22939549, I have definitely decided to not poke sleeping bears and rebranded the whole thing. Thanks everyone for helping me.

This image was chosen out of three iterations, “in a 1940’s newsprint comicstrip style,” then taken into Photoshop to correct hallucinations, largely the text.

P.S. We used Comic Sans. :heart_eyes:


Fine-Tuning Visual Fine-Tuning

Otherwise we’re continuing to experiment with so-called Visual Fine Tuning using the Adventure Assets presented above, and have had some unexpected progress.

"Fine tuning is for when it’s easier to “show, not tell.” Which is both the best explanation of “fine tuning” I’ve read so far, and literally what we’re doing here.

Interestingly, this advice, “to show, not tell,” is considered “good storytelling” in other professions: As in, this is the same advice given to creative writers when alliterating a point in a novel. I believe this parallel is a helpful metaphor for better fine tuning AI in general.


The Model Pays More Attention to the Image Than Expected

Perhaps most unexpected event in current experimentation is the way the model pays attention to the images in the Adventure Asset (or Visual Fine Tuning File).


You see the golden-sepia hue our big buddy is taking in these uncorrected images? He’s not exactly lit correctly for the scene. This color is not verbally prompted anywhere—in fact, “ivory molten bone” is specified.

This color is coming from the actual visualization of the Bone Golem as frequently presented in the Adventure Asset. (See above). :partying_face: :heart_eyes:

This is an interesting development.

The AA is presented like this because it’s supposed to look like an ancient schematic, all browned and what-not.

It’s actually a little difficult to get the model to get the bone colors correct, even though it starts to eventually use the right tones through some simple feedback. This means that the visualization plays a strong role in image generation, and that’s really cool. It’s great news for detail-oriented illustrators everywhere.


Implications and Next Steps

On the surface, it seems like you want to have the clearest image possible in exactly the right colors in your Visual Fine Tuning File, otherwise the model will get confused.

This document is meant to be both Human and Machine readable—teaching both AI and DM to recreate the illustration. Visual consistency is important.

Therefore, I want to try a few iterations where there is some explicit copy in the Asset to the AI about ignoring that sepia effect, which is for the humans. If it doesn’t work, I’ll update the images in the proper colors.


Show All Angles

In other news, it is important in your Visual Fine Tuning to show the Object from as many angles as possible and to make sure they are clearly represented and labeled in your file.

While working on this series, I actually gave up. I wanted the model to wholly imagine the Bone Golem facing away, and gave very little in the way of verbal specifics. I couldn’t tell you how many images we iterated through without success.

It’s clear that this type of fine tuning file needs as much nuanced information of the object being reproduced as possible, from multiple angles, in order to be effective as a few-shot learning procedure.


Some Unexpected Magic

The magical thing about that whole back-shot I was trying to get was how it actually happened.

Artists can tell you, sometimes after you’ve been beating your head against a block for awhile, if you turn your attention elsewhere, sometimes the block just melts and everything comes together.

I guess it’s the same for ChatGPT.


Almost an hour after we were going for the Bone Golem facing away from the viewer, and gave up, the model came up with these most unexpectedly. I almost didn’t notice the change. It was pretty cool you guys. Now I can pop into Photoshop, extract the big fellow’s back, and put them into the Adventure Assets.


Then Things Got Ridiculous




Next Steps: How Does the cGPT Code Interpreter Work?

In addition to updating the Adventure Asset for a few more tries, I think the way to managing large battles and interactions is by creating simple Python games, rather like complex chess boards.

I know the cGPTs can execute code, but what I don’t know (if anyone does) if they can keep a state going in the background?

That is, the CustomGPT chat window’s advanced capabilities time out. (You can loose data if you’re not careful.) But, does anyone know if, while the chat is live, if code can be ran and maintained in the background? (We’re talking specifically about using the cGPT’s native abilities, not accessing via the API or using an Action.)

So can we have ChatGPT execute some code that will then await further instruction; OR, does code have to be executed in single steps.

Based on this, it will either be possible to run simple interactions in the background; OR, execute code that outputs the full game state into the thread to await further instructions.

1 Like

PDF Conversion to Spreadsheet Using GPT 4o-mini

Hey all,

We’re getting back to work converting the SRD into a more condensed Spreadsheet.

The recent release of 4o-mini makes using the Assistant API incredibly more affordable for this task. Alluvthesudden, it is the better tool than the ChatGPT UI. Initial testing is going swimmingly. This is exactly what mini is meant for, and it’s arrival will really empower small businesses to get in this game.

Why Mini vs a CustomGPT

Our initial conclusion was that a CustomGPT was the best option for this type of task—i.e. converting standardized, though wildly variable, semantic pdf data into a table.

In recent weeks, the CustomGPT's (dndGPT's) performance has gotten considerably better, but there are still a lot of variabilities in it's answers and closer exploration of the data showed many nuanced errors and hallucinations. For example, a similar skill in a similar field would have a DC of 4 instead of 3, or only transferring the first few sentences of an ability.

These last experiences exquisitely illustrate the need for careful oversite, correction, and CritiqueGPT-Style Models in your workflow.

Creativity Control: Smart but not Creative

For this type of task it is definitely preferable to lower the creativity (temperature) of the model so it sticks to the exact (rather boring) forms required by the task.

This alone makes Mini the preferred tool.

But! This is further complicated by the many variations in the data and how it needs to be formatted. This requires intelligence.

Fine Grained Control of Files via Vector Storage

This is a hyper-specialized task with very specific instructions that I didn't want taking up space in the public dndGPT's knoweldge base, potentially confusing the end user.

And upon consideration, it doesn't make sense to have the full SRD (a 400 page pdf) as the Knowledge Base for this task. (I'll show why below.)

Being able to define a Vector Store and relate only the immediately necessary files is very valuable for several reasons, such as reducing the likelihood that the model will hallucinate something from a different part of the manual, for example.

Being able to train a hyper-specialized Mini Assistant makes it a great tool to use again later... like when they release the SRD 5.2.

How to Organize Long Documentation in a Vector Store

Thanks to better analysis tools on the Platform, we can better understand how to set up a document search structure. The whole goal here is to really, really understand a particular document inside and out.

The SRD is an excellent source document for public experimentation. What I’m looking into is efficiency with regard to searching a) the full 400 page SRD pdf, b) the SRD dividing into smaller 25 pageish pdf chapters, and c) the specific 100 page sub-section of the SRD I’m working with (the monsters).

A Vector Store was created for each file set, then 4o Mini since we are only asking for simple search and retrieval for an entire monster, a value that varies 400-1000 [legacy] tokens per monster. These experiments were all performed with the “Aboleth,” an approximately 750 token monster.

First a Fun Mistake

Well I made an error but discovered something interesting. The error was that my System Instructions declared the exact file to find the information I was wanting to extract. :sweat_smile:

WHAT’S interesting is that, having this simple 20 word explanation, enabled the model to find the exact information I was looking for no matter the source document, check it out:

Results from the full document:

The SRD Divided into Chapters:

Only the three chapters of the SRD regarding monsters:

I expected the small section to be significantly more efficient than the divided section, and the divided significantly more efficient than the full document search. But both input and output tokens are relatively similar in all three experiments suggesting that, if the model knows where to look, regardless of the size of the document, it can find that information with significant efficiencies.

This means that, if you know vaguely where to look for a thing, if you tell the model, it can find it. The file name was enough information to find the requested subsection in the full document. Thus a 20 word vague prompt can save both time and money.

Fully Searching a Document Without Knowing Location
Alright, I removed said 20 words from the System Instructions, and the results became more what I expected in the first place:

Searching the Full Document Without Knowing Location

Searching the Full Document Split into Chapters

Searching a Small Subsection

Analysis

Alright, the biggest surprise here is what happens if you already kinda know the location of what you’re looking for.

If you don’t have, or provide, that information, we can see the model using significantly more input tokens 18,624 vs 37,809 to search the full document to extract the requested information. That’s 2.03x more input tokens when it doesn’t vaguely know what to look for.

Interestingly, if you don’t provide vague location information, but have divided the source document into smaller (well named) subsections, the model provides almost the same response when it already knows where to look. 18, 742 to 19,193 input tokens.

What was expected, and is here demonstrated, is that, if you do not know the location, but have previously split the document into subsections and related them through the same Vector Store, and the files are named appropriately, there are significant efficiency gains. 19,193 to 37,809 input tokens when searching the split vs the full document. Again, a 1.96x difference. :smirk:

What was unexpected and is also demonstrated is that further splitting the document into smaller subsections (only the area of the document with monsters, a 100 page subsection) didn’t yield significantly different input results from a full divided document. 18,889 input tokens knowing location, and 19,490 input tokens when not knowing location.

Here, the win is the size of the Vector Store, which drops from 5mb to 1mb when searching either the full document and/or the full split document, and the small subsection. A minor efficiency gain. So if you can take the time, splitting the full document is the way to go. If you’re in a hurry, grab a full subsection.

Finally, it is awesome to note that, regardless of the input size, the model extracted the appropriate information with only one small variance in all six experiments. (It sometimes does/does not include information about custom instructions regarding the Aboleth’s Legendary Actions. :thinking:) You can see this in the stability of output tokens. That’s pretty cool.

Conclusions

  1. If you only kinda-sorta know where you’re looking in a document of any length, include that information in your prompt. It can save all sorts of compute and money. That’s wild.
  2. Dividing a long document into smaller, appropriately named sections related through the same Vector Store yields significant demonstrable search efficiencies. i.e., It is worth taking a long document and splitting it into chapters.
  3. Further dividing a document into a smaller vector store does decrease storage costs, but does not yield significant search efficiencies from the full split document in a single vector store.
  4. Regardless of input, the output was remarkably stable, with only the variance of a single paragraph in all six experiments.
1 Like

That’s awesome :slightly_smiling_face: can it work with homebrew campaigns?

Hiya, welcome!

Yeah! It sure can!

You’ll need your campaign in a text or pdf format, with clear headings. Just call up @dndgpt with the attached campaign file, and tell it where you are at in the story.

It can help build balanced campaigns, too! Like making sure your monster encounters are level appropriate, helping you brew up something unique for the space, or illustrating the whole scene.

I’d like to standardize the campaign building / management process—meaning I want to provide some specifics for the GPT to follow when working with homebrew, and assisting during a campaign. If you have some time and would like to share your experience, shoot me a note, I’d really appreciate it!

Automating the SRD-to-Spreadsheet Task

The current goal of this project is to automate the data extraction from the SRD PDF using Python and Assistants then input it into a csv using Structured Output.

This has proven surprisingly challenging.

Persistent Issue with Name Lookup

It’s hard to look up names from the SRD PDF. :expressionless:

From the start of this project months ago, it's been hard to generate a simple list of names from the Monsters section of the SRD 5.1, and it persists to the API and mini.

The model usually gets the first few names next in the list right, but then it starts pulling from all over the place and/or hallucinates answers. 4o-mini has gotten stuck in loops on pages in the SRD, for some reason having trouble remembering to "look on the next page" when performing a simple sequential lookup.

Even with the Price Reduction, 4o is Too Expensive for the Task

Interestingly, I reliably got 4o to retrieve the immediate next monster correctly. However, Tokens In was equal to the amount needed to search & extract the information if you already knew the name.

dndgpt_capture_assistant_search_4ovsmini

First, I was thinking you could use 4o to look up the name from the SRD, then pass the information via a Thread to a 4o mini for extraction and pre-structuring, then passing to another Assistant for Structured Output.

However, since the same amount of Tokens In are used as context for each model, this is inherently inefficient—you’re paying twice to search through the same data.

Furthermore, even if I just used 4o for the full retrieval for the next item, it would be prohibitively expensive. That’s around $0.30 per monster for roughly 300 monsters = $90 for the full extraction. The 4o was also more likely to add information to the data set, which isn’t useful.

Hypothesis: Too Convoluted

Insofar as Data Extraction is concerned, the D&D SRD 5.1 is an edge case.

The reason I think identifying and listing the names of these monsters is hard is because the Source Document is convoluted. Even the names require some thought.

There are these Parent Monster headings that apply to some, not all, of the monsters—the heading looks slightly different, but there is no visual indication of when a monster is no longer nested beneath the parent beyond the context of what the monster is.

You can have a Parent Monster heading of "Ghoul," followed by a Ghast which is a "Ghoul", and a Ghoul... which is also a Ghoul...

My Solution

My solution here is, and has been, to 'roll up my sleeves,' and look up the names myself, then create the first part of the spreadsheet manually.

Ultimately there were so many variations, hallucinations, errors, and out-of-order information that even the simple lookup required so much oversight that I was doing the search myself anyway to confirm the next items on the list.

It's surprising that this simple name lookup takes so much intelligence, and, I guess, "discipline."

The Brand New dndGPT Monster Manual… Spreadsheet!

Howdy friends, happy fall!

I’m excited to share that dndGPT FINALLY got an update! That’s right, we’ve at last completed the brand new Monster Manual… spreadsheet! :man_lifting_weights:

Monster lookups (via the GPT) is far more efficient via the spreadsheet; and this condenses two files into one. (Freeing up space in the limited cGPT Knowledge Base.)

Jump on to dndGPT and give it a try! You can do some interesting analysis now, like getting a list of all the monsters that have Legendary Abilities or other nonsense.

The dndGPT Monster Extractorizer—My First Data Molecule

The actual cool part here is that the spreadsheet extraction was fully automated using a python script that called on multiple Assistants, both 4o mini, one attached to a Vector Store, the other uses Structured Output. :eyes:

A “Data Molecule” is what I am calling a hyperspecialized cluster consisting of one data source, and any number of AI Assistants that tend to that data, and only that data. :atom_symbol:

You can check out the repo here. All of the files, System Instructions, and Structured Output schema are included.

The CSV has been fully populated in 60-90 minutes, with over 5m Input Tokens to 800k Output, and costs less than $0.90 to operate with roughly 95% accuracy (which varies by cell). :face_holding_back_tears:

Structured Output Success!

Using Structured Response was a complete success.

The biggest variance in the spreadsheet is in the actual content. Here, I can see several places better prompting could help.

Code Written with ChatGPT

The code was 85% supported by ChatGPT. There is no way I could have done anything on this level on my own.

I “know what to do, but not how to do it.” With the extra help, and moving step by step to ensure compliance with proper principles, we were able to go above and beyond, like working out Analytics—which is why I can say with confidence that there is an average 95% accuracy for the content, 100% Structured Output

We’ve compared 6 full runs for similar results. (Which cost a total $5.95. Bwah-ha-ha-ha-ha-ha.)

Next Steps

Improving Accuracy

Bringing the overall accuracy to the source document to 100% is a great goal, but I suspect will incur several types of diminishing returns as results approach 100%. Meaning, there is a cap on how much time should be spent further specializing this flow before saying “good enough.”

  • Increasing accuracy first will involve better prompting per each individual output cell, which Structured Response enables.

  • Fine Tuning might also help, though given the price difference, might not be economically worth it.

Initial Lookups and Search Accuracy via Items, and Spells.

The next two Data Molecules on my list are a flow for converting the Items and Spells sections of the SRD—both of which have similar-but-different structures.

Next Steps for the Custom GPT

Now that some file space has been opened up without sacrificing accuracy, we can start to experiment with other things—such as “simple game states,” that have the model spin up some code to set [d&d] armies marching… literally!

Small Business Safe

This report is huge, friends: With a cost of $0.90 and 95% accuracy, we have an available flow which is safe and affordable for small business use.

Each iteration—each Data Molecule—that follows an ‘extraction’ archetype will be similar in concept, though different in practice. Which means there’s a real need for folks who understand how all of this comes together.

This simple, understated, advance the essence of where it looks like our economy is headings, and given the stability of the results, the doors are about to open.

Understanding Search Results

API

Hey everyone, I just pulled my first full list of run steps with the new ability to inspect search results.

I’ve been having a persistent issue trying to get a sequential list of names accurately from my source document to no avail using 4.0 mini. 4o does it pretty well in two shots, but it’s expensive.

The new results offer insight. It’s easy to see what’s happening:

step_details': {'tool_calls': [{'file_search': {'ranking_options': {'ranker': 'default_2024_08_21',
                                                                                'score_threshold': 0.0},
                                                            'results': [{'file_id': 'file-9zMgimMJVADemQDdvCCiYc9k',
                                                                         'file_name': 'dndgpt_srd_cc_v5.1_appendix_miscellanous_creatures.pdf',
                                                                         'score': 0.7806456593422816},
                                                                        {'file_id': 'file-9zMgimMJVADemQDdvCCiYc9k',
                                                                         'file_name': 'dndgpt_srd_cc_v5.1_appendix_miscellanous_creatures.pdf',
                                                                         'score': 0.7586611131201971},
                                                                        {'file_id': 'file-9zMgimMJVADemQDdvCCiYc9k',
                                                                         'file_name': 'dndgpt_srd_cc_v5.1_appendix_miscellanous_creatures.pdf',
                                                                         'score': 0.7519938615983864},
                                                                        {'file_id': 'file-9zMgimMJVADemQDdvCCiYc9k',
                                                                         'file_name': 'dndgpt_srd_cc_v5.1_appendix_miscellanous_creatures.pdf',
                                                                         'score': 0.7138062447563812},
                                                                        {'file_id': 'file-9zMgimMJVADemQDdvCCiYc9k',
                                                                         'file_name': 'dndgpt_srd_cc_v5.1_appendix_miscellanous_creatures.pdf',
                                                                         'score': 0.7105510809232527},
                                                                        {'file_id': 'file-9zMgimMJVADemQDdvCCiYc9k',
                                                                         'file_name': 

1. You Can Restrict Search by the File Name.

The User Message names the file I’m looking at by name, while the Assistant and the Vector Store are designed to handle multiple files. Pretty Neat.

2. Search Relevance Drops Quickly

At least in this case, the model gives the Search Relevance a 78% score that drops by 10% in the first ten results. You can see this reflected in the returned quality.

I’ve got some ideas how to work with this. I’d really like Mini to be able to perform this task, and I’m genuinely confused why it’s difficult for it when it can perform a full lookup just fine.

2 Likes